Why my model doesn’t converge?

To use Resnet-50 to run CIFAR100 dataset, I wrote a program by using Tensorflow. But when running it, the loss seems keeping in about 4.5~4.6 forever:

After changed models (from Resnet to fully-connect-net), optimizers (from AdamOptimizer to AdagradOptimizer), and even learning rate (from 1e-3 to even 1e-7), the phenomena didn’t change at all.
Finally, I checked the loss and the output vector step by step, and found that the problem is not in model but dataset code:

Every batch of data have the same pictures and same labels! Than’t why the model didn’t converge. I should have used ‘i’ instead of ‘self.pos’ as index to fetch data and labels.

So in DeepLearning area, problems comes not only from models and hyper-parameters, but also dataset, or faulty codes…

Fix Resnet-101 model in example of MXNET

SSD(Single Shot MultiBox Detector) is the fastest method in object-detection task (Another detector YOLO, is a little bit slower than SSD). In the source code of MXNET´╝îthere is an example for SSD implementation. I test it by using different models: inceptionv3, resnet-50, resnet-101 etc. and find a weird phenomenon: the size .params file generated by resnet-101 is smaller than resnet-50.

Model Size of .params file
resnet-50 119MB
resnet-101 69MB

Since deeper network have larger number of parameters, resnet-101 has smaller file size for parameters seems suspicious.

Reviewing the code of example/ssd/symbol/symbol_factory.py:

Why resnet-50 and resnet-101 has the same ‘from_layers’ ? Let’s check these two models:

In resnet-50, the SSD use two layers (as show in red line) to extract features. One from output of stage-3, another from output of stage-4. In resnet-101, it should be the same (as show in blue line), but it incorrectly copy the config code of resnet-50. The correct ‘from_layers’ for resnet-100 is:

This seems like a bug, so I create a pull request to try fixing it.

Training DNN with less memory cost

The paper “Training Deep Nets with Sublinear Memory Cost” tells us a practical method to train DNN with far less memory cost. The mechanism behind is not difficult to understand: when training a deep network (a computing graph), we have to store temporary data in every node, which will occupy extra memory. Actually, we could remove these temporary data after computing each node, and compute them again in back-propagation period. It’s a tradeoff between computing time and computing space.

The author give us an example in MXNET. The improvement of memory-reducing seems tremendous.

Above the version 1.3, tensorflow also brought a similar module: memory optimizer. We can use it like this:

Still need to add op in Resnet:

By using this method, we could increase batch-size even in deep network (Resnet-101 etc.) now.

Use Mxnet To Classify Images Of Birds (Fourth Episode)

More than half a year past since previous article. In this period, Alan Mei (my old ex-colleague) collected more than 1 million pictures of Chinese Avians. And after Alexnet, VGG19, I finally chose Resnet-18 as my DNN model to classify different kinds of Chinese birds. Resnet-18 model has far less parameters of network than VGG19, but still get enough capability of representation.

Collecting more than 1 million sample pictures of birds and label them (some by program, and some by hand) is really a tedious work. I really appreciate Alan Mei for accepting so hard a job, although he said he is a Avian fans :). And I also need to thank him for giving me a Personal Computer with GTX970 GPU. Without this GPU, I would not train my model so fast.

To make the accuracy of classifying better, I have read the book “Deep Learning” and many other papers (Not only Resnet-paper, of course). The reward of knowledge about machine learning and deep learning is abundant for me. But the most important of all is: I enjoyed the learning of new technology again.

Today, we launch this simple web: http://en.dongniao.net/ . In Chinese language, “dongniao” means “Understanding Avians”. Hope the Avian-Fans and Depp Learning Fans will love it.