Problem about using slim.batch_norm() of Tensorflow (second episode)

In previous article, I have found out the reason. But how to resolve it on Multi-GPU-Training is still a question. As the suggestion of this issue in github, I tried two way to fix the problem:

First, rewrite my Averaging-Gradients-Training to learn tf.slim.create_train_op():

But unfortunately, this didn’t work at all. The inference result was still a mess.

Then, another way, I use Asynchronous-Gradient-Training and tf.slim.create_train_op():

Now the inference works very well! And the training speed become a little bit faster than Averaging-Gradients-Training, for the Averaging Operation needs to wait multi gradients from multi GPUs.

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

This site uses Akismet to reduce spam. Learn how your comment data is processed.