Why my model doesn’t converge?

To use Resnet-50 to run CIFAR100 dataset, I wrote a program by using Tensorflow. But when running it, the loss seems keeping in about 4.5~4.6 forever:

After changed models (from Resnet to fully-connect-net), optimizers (from AdamOptimizer to AdagradOptimizer), and even learning rate (from 1e-3 to even 1e-7), the phenomena didn’t change at all.
Finally, I checked the loss and the output vector step by step, and found that the problem is not in model but dataset code:

Every batch of data have the same pictures and same labels! Than’t why the model didn’t converge. I should have used ‘i’ instead of ‘self.pos’ as index to fetch data and labels.

So in DeepLearning area, problems comes not only from models and hyper-parameters, but also dataset, or faulty codes…

Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.