【tensorflow】macOS 10.13.4 编译 GPU 版本的 TensorFlow 1.8

参考文章:

环境要求

  • MAC OSX 10.6+
  • 在mac上禁用SIP
  • NVIDIA Web-Drivers 驱动
  • CUDA-Drivers 驱动
  • CUDA 9.1 Toolkit 开发工具
  • cuDNN 7.0.5 神经计算加速工具
  • Python 3.6
  • xcode 8.2
  • bazel 0.10

环境准备

Homebrew

如果尚未安装,Homebrew还将安装最新的Apple Command-Line-Tools

1
$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

安装coreutils,llvm,OpenMP

1
$ brew install coreutils llvm cliutils/apple/libomp

Python 依赖

1
$ pip install six numpy wheel

bazel

下载0.10版本,
bazel发布页
需要注意,这里必须是 0.10 版本,新或旧都能导致编译失败。

二进制文件安装方法

1
2
$ chmod +x bazel-<version>-installer-darwin-x86_64.sh
$ ./bazel-<version>-installer-darwin-x86_64.sh

降级 Xcode 到 8.2

不必在最开始就降级,这一步可以放到准备环节最后

去apple开发者官网下载包

1
$ sudo xcode-select -s /Applications/Xcode8.2.app

换回最新版开发ios可以用

1
$ sudo xcode-select -s /Applications/Xcode.app

NVIDIA

NVIDIA Web-Drivers

在安装CUDA驱动程序之前下载并安装与mac版本对应的GPU驱动。
tonymacx86 下载

安装 CUDA Toolkit 9.1

cuda开发套件自带duda驱动
下载 CUDA-9.1

安装 cuDNN

下载cuDNN 7.0.5[^1]

切换到解压缩的CUDNN目录

1
2
3
4
5
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib/libcudnn_static.a /usr/local/cuda/lib
$ sudo cp cuda/lib/libcudnn.7.dylib /usr/local/cuda/lib
$ sudo ln -s /usr/local/cuda/lib/libcudnn.7.dylib /usr/local/cuda/lib/libcudnn.dylib
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*

添加环境变量

1
2
3
4
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$PATH:$DYLD_LIBRARY_PATH

检查NVIDIA开发环境

1
nvcc -V

Build准备

拉取 TensorFlow 源码 release 1.8 分支

1
2
$ git clone https://github.com/tensorflow/tensorflow -b r1.8
$ cd tensorflow

修改代码,使其与macOS兼容

替换掉以下三个文件的 align(sizeof(T))

1
2
3
4
$ cd tensorflow
$ sed -i -e "s/ __align__(sizeof(T))//g" tensorflow/core/kernels/concat_lib_gpu_impl.cu.cc
$ sed -i -e "s/ __align__(sizeof(T))//g" tensorflow/core/kernels/depthwise_conv_op_gpu.cu.cc
$ sed -i -e "s/ __align__(sizeof(T))//g" tensorflow/core/kernels/split_lib_gpu.cu.cc

添加依赖头文件nccl.h(如编译1.7不用做此步骤)

nccl.h下载 放在 third_party/nccl 文件夹内

修改tensorflow/workspace.bzl文件

1
2
3
4
5
6
7
8
9
tf_http_archive(
name = "protobuf_archive",
urls = [
"https://mirror.bazel.build/github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
"https://github.com/google/protobuf/archive/396336eb961b75f03b25824fe86cf6490fb75e3a.tar.gz",
],
sha256 = "846d907acf472ae233ec0882ef3a2d24edbbe834b80c305e867ac65a1f2c59e3",
strip_prefix = "protobuf-396336eb961b75f03b25824fe86cf6490fb75e3a",
)

搜索如上替换为如下

1
2
3
4
5
6
7
8
9
tf_http_archive(
name = "protobuf_archive",
urls = [
"https://mirror.bazel.build/github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
"https://github.com/dtrebbien/protobuf/archive/50f552646ba1de79e07562b41f3999fe036b4fd0.tar.gz",
],
sha256 = "eb16b33431b91fe8cee479575cee8de202f3626aaf00d9bf1783c6e62b4ffbc7",
strip_prefix = "protobuf-50f552646ba1de79e07562b41f3999fe036b4fd0",
)

修复third_party/gpus/cuda/BUILD.tpl文件-lgomp报错

1
linkopts = ["-lgomp"],

搜索如上,注释掉

1
# linkopts = ["-lgomp"],

开始Build

Build 配置

1
$ ./configure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
You have bazel 0.13 installed.
Please specify the location of python. [Default is /Users/user/.pyenv/versions/tensorflow-gpu/bin/python]:


Found possible Python library paths:
/Users/user/.pyenv/versions/tensorflow-gpu/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/Users/user/.pyenv/versions/tensorflow-gpu/lib/python3.6/site-packages]

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1


Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:


Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]6.1


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished

Build

1
2
$ bazel clean --expunge
$ bazel build --config=cuda --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package

注意cpu老旧可以使用以下命令编译

1
$ bazel build --config=cuda --config=opt --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH --copt=-march=native --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma //tensorflow/tools/pip_package:build_pip_package

创建wheel文件并安装

1
2
3
$ bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ cd ~
$ sudo pip install /tmp/tensorflow_pkg/tensorflow-1.8-cp36-cp36m-macosx_10_13_x86_64.whl