How to average gradients in Tensorflow

Sometimes, we need to average an array of gradients in deep learning model. Fortunately, Tensorflow divided models into fine-grained tensors and operations, therefore it’s not difficult to implement gradients average by using it.

Let’s see the code from github

We should keep in mind that these codes will only build a static graph (the ‘grads; are references rather than values).

First, we need to expand dimensions of tensor(gradient) and concatenate them. Then use reduce_mean() to do actually average operation (seems not intuitive).

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

This site uses Akismet to reduce spam. Learn how your comment data is processed.