Install cuda 8.0 referring here.
As the default gcc in Ubuntu 16.04 is very new, if you have compiling error similar to #error -- unsupported GNU version! gcc versions later than 5.3 are not supported!
Try to uncomment the #error line in file /usr/local/cuda/include/host_config.h
#if __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 3) //#error -- unsupported GNU version! gcc versions later than 5.3 are not supported! #endif /* __GNUC__ > 5 || (__GNUC__ == 5 && __GNUC_MINOR__ > 1) */
Update nvidia driver
You may need to remove the old driver version installed with cuda-8.0 and install a newer version of driver, if you have error similar to modprobe: ERROR: could not insert 'nvidia_361_uvm': Invalid argument
Please refer here to remove old driver, then install a new one:
sudo apt-get purge nvidia-* dkms status sudo dkms remove bbswitch/0.8 -k 4.4.0-31-generic #do this based on your `dkms status` sudo ./NVIDIA-Linux-x86_64-367.35.run # download the compatible driver (GTX 1080 in my case) from nvidia website and install it sudo reboot # reboot if the driver was not updated ./cuda8-0-samples/bin/x86_64/linux/release/deviceQuery # compile the sample code and check if it works
Install cudnn V5
cudnn v4 is NOT supported in 1080. You may compile and run, but will not converge.
Install prerequisities of Caffe
Because the gcc version is Ubuntu 16.04 is very new, If any prerequisity installed from apt-get does not work, uninstall it, compile and install it by the default gcc (5.4) from the source code. The prerequisities that may have problems include: protobuf and opencv. E.g. if you have protobuf error similar to
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::io::CodedOutputStream::WriteVarint64ToArray(unsigned long long, unsigned char*)'
Try to uninstall the protobuf installed by apt-get, the one installed by apt-get might have been compiled by a older gcc version, so its shared libraries may not be compatible with your default gcc:
sudo apt-get purge libprotobuf-dev protobuf-compiler
then compile the protobuf-2.5.0 from src and install it. Please config the default gcc (5.4 in my case) when you compile protobuf:
./configure --prefix=/your/path/ CC=/usr/bin/gcc make make check make install
Compile and test Caffe here.
Please also refer here.
If you use anaconda, and get error like awk: symbol lookup error: $HOME/anaconda2/lib/libreadline.so.6: undefined symbol: PC
try to remove readline in anaconda lib so as to use the default system one.
conda remove --force readline
Compile and test TensorFlow here.
If there is error like ERROR: /mnt/tmp/tensorflow/tensorflow/core/BUILD:87:1: //tensorflow/core:protos_all_py: no such attribute 'imports' in 'py_library'
, newer version (e.g. 0.2.3) of bazel can solve this problem.
git clone https://github.com/bazelbuild/bazel.git cd bazel/ git tag -l git checkout tags/0.2.3 # or 0.3.0 etc. ./compile.sh
“This will create a bazel binary in bazel-bin/src/bazel. This binary is self-contained, so it can be copied to a directory on the PATH (e.g., /usr/local/bin) or used in-place. ”
Use which bazel
to make sure your bazel is updated.
Then build and install:
./configure bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer # test bazel-bin/tensorflow/cc/tutorials_example_trainer --use_gpu # build pip package with gpu support bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/tmp/tensorflow_pkg # The name of the .whl file will depend on your platform. sudo pip install ~/tmp/tensorflow_pkg/tensorflow-0.9.0-py2-none-any.whl mkdir _python_build cd _python_build ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/* . ln -s ../tensorflow/tools/pip_package/* . python setup.py develop cd tensorflow/models/image/mnist python convolutional.py
Known issues
- TensorFlow compiling @ RHEL
ERROR: /home/wwen/github/tensorflow/tensorflow/core/kernels/BUILD:1529:1: undeclared inclusion(s) in rule '//tensorflow/core/kernels:depth_space_ops_gpu': this rule is missing dependency declarations for the following files included by 'tensorflow/core/kernels/depthtospace_op_gpu.cu.cc': '/fdata/opt/cuda-7.5/include/cuda_runtime.h' '/fdata/opt/cuda-7.5/include/host_config.h' '/fdata/opt/cuda-7.5/include/builtin_types.h' '/fdata/opt/cuda-7.5/include/device_types.h' '/fdata/opt/cuda-7.5/include/host_defines.h' '/fdata/opt/cuda-7.5/include/driver_types.h' '/fdata/opt/cuda-7.5/include/surface_types.h' '/fdata/opt/cuda-7.5/include/texture_types.h' '/fdata/opt/cuda-7.5/include/vector_types.h'
Solution: add cuda include path to third_party/gpus/crosstool/CROSSTOO
by adding a line similar to cxx_builtin_include_directory: "/usr/local/cuda-7.5/include"
- bazel compiling @ RHEL
If no JAVA_HOME is found when installing bazel from source code, install java sdk and make sure the default one is the one you want
sudo yum install java-1.8.0-openjdk-devel /usr/sbin/alternatives --config java
Add JAVA_HOME variables in ~/.bashrc:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk export PATH=$PATH:$JAVA_HOME/bin
- TensorFlow WORK_DIR issue
bazel-bin/inception/download_and_preprocess_imagenet "/fdata/imagenet-data"
And get error: bazel-bin/inception/download_and_preprocess_imagenet: line 66: bazel-bin/inception/download_and_preprocess_imagenet.runfiles/inception/data/download_imagenet.sh: No such file or directory
, change WORK_DIR in ./inception/data/download_and_preprocess_imagenet.sh
to WORK_DIR="$0.runfiles/__main__/inception"
.