Monthly Archives: June 2018

Problems and solutions about building Tensorflow-1.8 with TensorRT 4.0

Problem:
When compiling Tensorflow-1.8 with CUDA-9.2, it reports:

bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasGemmEx@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasZhpmv_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cufftExecD2Z@libcufft.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasSrotg_v2@libcublas.so.9.0'
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Unn_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cufftExecR2C@libcufft.so.9.0'
...

Solution:
Add ‘/usr/local/cuda-9.2/lib64’ into ‘/etc/ld.so.conf’ and run ‘sudo ldconfig’ to make it works.
Problem:
When compiling Tensorflow-1.8, it reports:

./tensorflow/python/client/tf_session_helper.h:19:20: fatal error: Python.h: No such file or directory
...

Solution:
In ‘.tf_configure.bazelrc’ file, use real python location instead of soft link:

#don't use "/usr/bin/python"
build --action_env PYTHON_BIN_PATH="/usr/bin/python2.7"

Problem:
When running TensorRT, it reports:

ImportError: /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/web_server/dlpy72/dlpy/lib/python2.7/site-packages/tensorrt/infer/_nv_infer_bindings.so)

Solution:
Run TensorRT with LD_LIBRARY_PATH:

LD_LIBRARY_PATH=/usr/local/gcc-5.3/lib64:$LD_LIBRARY_PATH python run_tensorrt.py

Testing performance of Tensorflow’s fixed-point-quantization on x86_64 cpu

Google has published their quantization method on this paper. It use int8 to run feed-forward but float32 for back-propagation, since back-propagation need more accurate to accumulate gradients. I got a question right after reading the paper: why all the performance test works are on platform of mobile-phone (ARM architecture)? The quantization consequences of model in google’s method doesn’t only need addition and multiplication of int8 numbers, but also bit-shift operations. The AVX instruments set in Intel x86_64 architecture could accelerate MAC (Multiplication, Addition and aCcumulation), but couldn’t boost bit-shift operations.
To verify my suspicion, I wrote a model with ResNet-50 (float32) to classify CIFAR-100 dataset. After running a few epochs, I evaluate the speed of inference by using my ‘eval.py’. The result is:

Time: 5.58819s

Then, I follow these steps to add tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph() into my code. This time, the speed of inference is:

Time: 6.23221s

A little bit of disappointment. Using quantized (int8) version of model could not accelerate processing speed of x86 CPU. May be we need to find other more powerful quantization algorithm.
Appendix:

# eval.py
from input_data import Cifar100Data
import tensorflow as tf
import numpy as np
import resnet_v2
import argparse
import time
import sys
EVAL_SAMPLES = 10000
BATCH_SIZE = 10000
MODEL_PATH = './models/'
MODEL_NAME = 'cifar_resnet_50'
def cnn_part(images):
    print(images.shape)
    ivg, _ = resnet_v2.resnet_v2_50(images, 100)
    return ivg
def main(_):
    with tf.device('/cpu:0'):
        images = tf.placeholder(tf.float32, [BATCH_SIZE, 32, 32, 3])
        labels = tf.placeholder(tf.int64, [BATCH_SIZE])
    with tf.contrib.slim.arg_scope([tf.contrib.slim.conv2d],
                        weights_initializer = tf.truncated_normal_initializer(mean = 0, stddev = 0.1)):
        image_vector = cnn_part(images)
    loss = tf.losses.sparse_softmax_cross_entropy(labels = labels, logits = image_vector)
    loss = tf.reduce_mean(loss)
    opt = tf.train.AdamOptimizer(1e-3)
    train_op = tf.contrib.slim.learning.create_train_op(loss, opt)
    correct_prediction = tf.equal(tf.argmax(image_vector, 1), labels)
    correct_prediction = tf.cast(correct_prediction, tf.float32)
    accuracy = tf.reduce_mean(correct_prediction)
    data = Cifar100Data('/disk3/cifar/cifar-100-python/test')
    saver = tf.train.Saver()
    with tf.Session() as sess:
        with tf.gfile.FastGFile('./models/cifar_resnet_50_quant.pb') as fl:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(fl.read())
        tf.import_graph_def(graph_def, name = '')
        saver.restore(sess, MODEL_PATH + MODEL_NAME + '-' + str(FLAGS.epoch))
        batch = data.next_batch(BATCH_SIZE)
        for i in range(3):
            begin = time.time()
            res = sess.run(accuracy, feed_dict = {images: batch[0], labels: batch[1]})
            print("Time: %gs" % (time.time() - begin))
            print(res)
if __name__ == '__main__':
    parser = argparse.ArgumentParser()
    parser.add_argument('--epoch', type=str,
                        default='8',
                        help='Epoch of checkpoint for evaluation')
    FLAGS, unparsed = parser.parse_known_args()
    tf.app.run(main = main, argv = [sys.argv[0]] + unparsed)

Some tips about LaTeX

1. After running ‘bibtex paper’, it reports

Too many commas in name 1 of "J.Chen, R.Monga, S.Bengio, R.Jozefowicz" for entry Revisit_SGD

This is because we need to use ‘and’ to replace commas. After changing them

# Change 'author = "J.Chen, R.Monga, S.Bengio, R.Jozefowicz"' to
author = "J.Chen and R.Monga and S.Bengio and R.Jozefowicz"

The errors disappeared.
2. How to extend space between two rows in a table?
Answer:

\begingroup
\setlength{\tabcolsep}{10pt} % Default value: 6pt
\renewcommand{\arraystretch}{1.5} % Default value: 1
\begin{tabular}{ c c c }
First Row & -6 & -5 \\
Second Row & 4 & 10\\
Third Row & 20 & 30\\
Fourth Row & 100 & -30\\
\end{tabular}
\endgroup

3. Problem: Can’t upload .bib file in arXiv.org
Answer: run ‘pdflatex paper’ to generate paper.aux from paper.tex, and then run ‘bibtex paper’ to convert paper.bib to paper.bbl. Now we could upload .bbl file to arXiv.
4. Problem: When select ‘Tools’–>’Check Spelling…’ in texStudio, it report “No dictionary Available”.
Answer: Download english dictionary from https://extensions.openoffice.org/en/download/1471, change suffix from ‘oxt’ to ‘zip’ and unzip it. In ‘preferences’ of texStudio, set dictionary path to the unzip directory. (ref)
After solved all these problems, I eventually submit my paper here: https://arxiv.org/abs/1806.03925

Use pandas and matplotlib to draw line chart

I have two CSV files. Their content looks like:

3,7578.8374
4,4911.78
5,4922.014
6,3158.1414
7,2656.271
8,2520.162
9,1659.447
10,2295.329
...

The simplest way to load and draw them is by using pandas and matplotlib.

import matplotlib.pyplot as plt
import pandas as pd
one = pd.read_csv('./my1.csv', names = ['step', 'loss'])
two = pd.read_csv('./my2.csv', names = ['step', 'loss'])
plt.plot(one['step'], one['loss'], label = 'my1')
plt.plot(two['step'], two['loss'], label = 'my2')
plt.ylabel("Loss")
plt.xlabel("Training step")
plt.legend(prop = {'size': 10})      # Set size of legend to be smaller
plt.show()

The figure draw out by this snippet is shown below:

Hard training works in deep learning

This week, I was trying to train two deep-learning models. They are different from my previous training job: they are really hard to converge to a small ‘loss’.
The first model is about bird image classification. Previously we wrote a modified Resnet-50 model by using MXNet and could use it to reach 78% evaluation-accuracy. But after we rewrote the same model by using Tensorflow, it could only reach 50% evaluation-accuracy, which seems very weird. The first thing that in my mind is that it’s a regularization problem, so I randomly pad/crop and rotate the training images:

  image = tf.image.resize_image_with_crop_or_pad(image, IMAGE_HEIGHT + 80, IMAGE_WIDTH + 80)
  image = tf.contrib.image.rotate(image, tf.random_uniform([1], minval = -math.pi / 3.0, maxval = math.pi / 3.0))
  image = tf.random_crop(image, [IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS])

By data augmentation, the evaluation accuracy rise to about 60%, but still far from the result of MXNet.
Then I change the optimizer from AdamOptimizer to GradientDescentOptimizer, since my colleague tell me the AdamOptimizer is too powerful that it tends to cause overfit. And I also add ‘weight_decay’ for my Resnet-50 model. This time, the evaluation accuracy shrived to 76%. The affection of ‘weight_decay’ is significantly positive.
The second model is about object detection. We just use the example of Tensorflow’s model library. It includes many cutting-edge models to implement object detection. I just want to try SSD(Single Shot Detection) on MobileNetV2:

python object_detection/train.py \
  --logtostderr \
  --pipeline_config_path=/disk3/donghao/models/research/object_detection/samples/configs/ssd_mobilenet_v2_coco.config \
  --train_dir=/disk3/donghao/myckpt/ \
  --num_clones=2

The loss is rapidly reducing from hundreds to twelve, but stay at eleven for a very long time. The loss looks like will stay here forever. Then I begin to adjust hyper-parameters. After testing several learning rates and optimizers, the results doesn’t change at all.
Eventually, I noticed that the loss doesn’t stay forever, it WILL REDUCE AT LAST. For some tasks such as classification, its loss will converge significantly. But for other tasks such as object detection, its loss will shrink at extremely slow speed. Use AdamOptimizer and small learning rate is a better choice for this type of task.

Robin on Linux

Monthly Archives: June 2018