Training DNN with less memory cost

The paper “Training Deep Nets with Sublinear Memory Cost” tells us a practical method to train DNN with far less memory cost. The mechanism behind is not difficult to understand: when training a deep network (a computing graph), we have to store temporary data in every node, which will occupy extra memory. Actually, we could remove these temporary data after computing each node, and compute them again in back-propagation period. It’s a tradeoff between computing time and computing space.

The author give us an example in MXNET. The improvement of memory-reducing seems tremendous.

Above the version 1.3, tensorflow also brought a similar module: memory optimizer. We can use it like this:

Still need to add op in Resnet:

By using this method, we could increase batch-size even in deep network (Resnet-101 etc.) now.

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.