废话不多说,先说最终成功的版本:系统=>centos7 ,cuda=>10.0 ,cudnn=>7.5 ,nccl=>源码编译, tensorflow=>最新版本源码编译
1.cuda下载包:*.run,,直接 sh ./*.run 按照提示选择就能安装,一般选择默认路径 /usr/local/cuda方便后续操作
export PATH="/usr/local/cuda/bin:$PATH" export LD_LIBRARY_PATH="/usr/local//lib64:$LD_LIBRARY_PATH"
2.cudnn 解压后文件夹为cuda,将头文件和库文件分别拷贝到cuda对应的目录下:
sudo cp cuda/include/cudnn.h /usr/local/cuda/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
sudo ln -sf libcudnn.so.7.0.5 libcudnn.so.7 sudo ln -sf libcudnn.so.7 libcudnn.so sudo ldconfig
nvcc --version
rpm -ivh nccl*.rpm
./configure gmake make install
export PATH
Please specify the location of python. [Default is /usr/bin/python]: Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n Do you wish to build TensorFlow with XLA JIT support? [y/N]: n Do you wish to build TensorFlow with GDR support? [y/N]: N Do you wish to build TensorFlow with VERBS support? [y/N]: N Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N Do you wish to build TensorFlow with CUDA support? [y/N]: Y Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 10.0]:10.1 Please specify the location where CUDA 10.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.1]: Do you wish to build TensorFlow with TensorRT support? [y/N]: N Please specify the NCCL version you want to use. [Leave empty to default to NCCL 2]: 2.4.2 Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1] Do you want to use clang as CUDA compiler? [y/N]: N Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc Do you wish to build TensorFlow with MPI support? [y/N]: N Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Cuda Configuration Error: No library found under: /usr/local/cuda-10.1/lib64/libcublas.so.10.1, /usr/local/cuda-10.1/lib64/stubs/libcublas.so.10.1, /usr/local/cuda-10.1/lib/powerpc64le-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x86_64-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x64/libcublas.so.10.1, /usr/local/cuda-10.1/lib/libcublas.so.10.1, /usr/local/cuda-10.1/libcublas.so.10.1
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcublas.so. /usr/lib64/libcublas.so.10.0
Cuda Configuration Error: None of the libraries match their SONAME: /home/bernard/opt/cuda_test/cuda/lib64/libcublas.so.10.1
fatal error: nccl.h: No such file or directory
搜索发现需要安装 libnccl2 libnccl-dev libnccl-static ,但是网上教程都是ubuntu的使用apt get 安装,centos只有yum,尝试执行,报错
No package "libnccl" available
cd nccl make -j src.build make src.build yum install build-essential devscripts debhelper make pkg.debian.build
Please specify the location of python. [Default is /usr/bin/python]: Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n Do you wish to build TensorFlow with XLA JIT support? [y/N]: n Do you wish to build TensorFlow with GDR support? [y/N]: N Do you wish to build TensorFlow with VERBS support? [y/N]: N Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: N Do you wish to build TensorFlow with CUDA support? [y/N]: Y Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 10.0]: Please specify the location where CUDA 10.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.1]: Do you wish to build TensorFlow with TensorRT support? [y/N]: N Please specify the NCCL version you want to use. [Leave empty to default to NCCL 2]: Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-10.0]: Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1] Do you want to use clang as CUDA compiler? [y/N]: N Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: /usr/bin/gcc Do you wish to build TensorFlow with MPI support? [y/N]: N Please specify optimization flags to use during compilation when bazel option “–config=opt” is specified [Default is -march=native]: Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:N
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/*.whl
import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
开始以为是没有编译tensorboard依赖,看了源码发现并不需要另外下载,最后查看了一下tensorboard的文件时间,发现是以前安装的没有卸载干净,pip uninstall 卸载后重新安装,一切正常
其实安装完cuda和cudnn后可以直接pip install tensorflow-gpu的,不用自己重新编译(也就不需要安装cmake,bazel),当初以为没有最新版本,所以自己编译,后来发现直接安装的编译环境就是cuda10.0,不过贴合系统的编译总是好用的,哈哈!