Tag Archives: tensorflow

How Tensorflow set device for each Operation ?

In Tensorflow, we only need to use snippet below to assign a device to a Operation:

How dose it implement? Let’s take a look. There is a mechanism called ‘context manager’ in Python. For example, we can use it to add a wrapper for a few codes:

The… Read more »

Some tips about using google’s TPU (Cont.)

Sometimes I get this error from TPUEstimator:

And after stop and restart TPU in console of GCP, the error disappeared. TPU doesn’t allow users to use it directly like GPU. You can’t see the device in VM looks like ‘/dev/tpu’ or something like this. Google provides TPU as RPC… Read more »

Some modifications about SSD-Tensorflow

In the previous article, I introduced a new library for Object Detection. But yesterday, after I added slim.batch_norm() into ‘nets/ssd_vgg_512.py’ like this:

Although training could still run correctly, the evaluation reported errors:

I wondered why adding some simple batch_norm will make shape incorrect for quite a while. Finally… Read more »

Choosing a Object Detection Framework written by Tensorflow

Recently I need to train a DNN model for object detection task. In the past, I am using the object detection framework from tensorflows’s subject — models. But there are two reasons that I couldn’t use it to do my own project: First, it’s too big. There are more than… Read more »

Do tf.random_crop() operation on GPU

When I run code like:

it reports:

Looks operation tf.random_crop() doen’t have CUDA kernel implementation. Therefore I need to write it myself. The solution is surprisingly simple: write a function to do random_crop on one image by using tf.random_uniform() and tf.slice(), and then use tf.map_fn() to apply it… Read more »

Regularization loss in ‘slim’ library of Tensorflow

My python code using slim library to train classification model in Tensorflow:

It works fine. However, no matter what value the ‘weight_decay’ is, the training accuracy of the model could reach higher than 90% easily. It seems ‘weight_decay’ just doesn’t work. In order to find out the reason, I… Read more »

Using multi-GPUs for training in distributed environment of Tensorflow

I am trying to write code for training on multi-GPUs. The code is mainly from the example of ‘Distributed Tensorflow‘. I have changed the code slightly for runing on GPU:

But after launch the script below:

it reports:

Seems one MonitoredTrainingSession will occupy all the memory of… Read more »

Problems and solutions about building Tensorflow-1.8 with TensorRT 4.0

Problem: When compiling Tensorflow-1.8 with CUDA-9.2, it reports:

Solution: Add ‘/usr/local/cuda-9.2/lib64’ into ‘/etc/ld.so.conf’ and run ‘sudo ldconfig’ to make it works. Problem: When compiling Tensorflow-1.8, it reports:

Solution: In ‘.tf_configure.bazelrc’ file, use real python location instead of soft link:

Problem: When running TensorRT, it reports:

Solution:… Read more »

Testing performance of Tensorflow’s fixed-point-quantization on x86_64 cpu

Google has published their quantization method on this paper. It use int8 to run feed-forward but float32 for back-propagation, since back-propagation need more accurate to accumulate gradients. I got a question right after reading the paper: why all the performance test works are on platform of mobile-phone (ARM architecture)? The… Read more »