Resnet

Summaries for Kaggle’s competition ‘Histopathologic Cancer Detection’

Firstly, I want to thank for Alex Donchuk‘s advice in discussion of competition ‘Histopathologic Cancer Detection‘. His advice really helped me a lot.
1. Alex used the ‘SEE-ResNeXt50’. Instead, I used the standard ‘ResNeXt50’. Maybe this is the reason why my score ‘0.9716’ in public leaderboard is not as good as Alex’s. After the competition, I did spend some time to read the paper about ‘SE-ResNeXt50’. It’s really a simple and interesting idea about optimizing the architecture of the neural network. Maybe I can use this model on my next Kaggle competition.
2. In this competition, I split the training dataset into ten folds and train three different models on different train/eval splits. After ensembled these three models, it could get a nice score. Seems Bagging is a good method on practical application.
3. After training model to a ‘so far so good’ f1-score by using SGD with ReduceOnPlateu in Keras, I use this model as the ‘base model’ for following fine-tuning. By ensemble all high-score finetuning models, I eventually get the best score. This strategy comes from the Snapshot Ensembles.
4. By the way, ReduceOnPlateu is really useful when using SGD as the optimizer.

Using ResNeXt in Keras 2.2.4

To use ResNeXt50, I wrote my code as the API documentation for Keras:

keras.applications.resnext.ResNeXt50(...)

But it reported errors:

AttributeError: module 'keras.applications' has no attribute 'resnext'

That’s weird. The code doesn’t work as documentation said.
So I checked the code of Keras-2.2.4 (the version in my computer), and noticed that this version of code use ‘keras_applications’ instead of ‘keras.applications’.
Then I changed my code:

keras_applications.resnext.ResNeXt50((input_tensor = pinp, include_top = False, weights = 'imagenet')

But it reported another error:

Using TensorFlow backend.
Traceback (most recent call last):
  File "ktrain.py", line 292, in 
    main()
  File "ktrain.py", line 277, in main
    model, orig_model, branch_model, head_model = build_model(args)
  File "ktrain.py", line 210, in build_model
    branch_model = resnet_model(args, img_shape)
  File "ktrain.py", line 164, in resnet_model
    base_model = keras_applications.resnext.ResNeXt50(input_tensor = pinp, include_top = False, weights = 'imagenet')
  File "/usr/lib/python3.6/site-packages/keras_applications/resnet_common.py", line 555, in ResNeXt50
    **kwargs)
  File "/usr/lib/python3.6/site-packages/keras_applications/resnet_common.py", line 348, in ResNet
    data_format=backend.image_data_format(),
AttributeError: 'NoneType' object has no attribute 'image_data_format'

Witout choice, I had to check code of ‘/usr/lib/python3.6/site-packages/keras_applications/resnet_common.py’ too. Finally, I realise the ResNeXt50() function need three more arguments:

keras_applications.resnext.ResNeXt50(
        input_tensor = pinp, include_top = False, weights = 'imagenet',
        backend = keras.backend, layers = keras.layers, models = keras.models, utils = keras.utils)

Now the program could run ResNeXt50 model correctly. This github issue explained the detail: the ‘keras_applications’ could be used both for Keras and Tensorflow, so it needs to pass library details into model function.

Why my model doesn’t converge?

To use Resnet-50 to run CIFAR100 dataset, I wrote a program by using Tensorflow. But when running it, the loss seems keeping in about 4.5~4.6 forever:

step: 199, loss: 4.61291, accuracy: 0
step: 200, loss: 4.60952, accuracy: 0
step: 201, loss: 4.60763, accuracy: 0
step: 202, loss: 4.62495, accuracy: 0
step: 203, loss: 4.62312, accuracy: 0
step: 204, loss: 4.60703, accuracy: 0
step: 205, loss: 4.60947, accuracy: 0
step: 206, loss: 4.59816, accuracy: 0
step: 207, loss: 4.62643, accuracy: 0
step: 208, loss: 4.59422, accuracy: 0
...

After changed models (from Resnet to fully-connect-net), optimizers (from AdamOptimizer to AdagradOptimizer), and even learning rate (from 1e-3 to even 1e-7), the phenomena didn’t change at all.
Finally, I checked the loss and the output vector step by step, and found that the problem is not in model but dataset code:

    def next_batch(self, batch_size = 64):
        images = []
        labels = []
        for i in range(self.pos, self.pos + batch_size):
            image = self.data['data'][self.pos]
            image = image.reshape(3, 32, 32)
            image = image.transpose(1, 2, 0)
            image = image.astype(np.float32) / 255.0
            images.append(image)
            label = self.data['fine_labels'][self.pos]
            labels.append(label)
        if (self.pos + batch_size) >= CIFAR100_TRAIN_SAMPLES:
            self.pos = 0
        else:
            self.pos = self.pos + batch_size
        return [images, labels]

Every batch of data have the same pictures and same labels! Than’t why the model didn’t converge. I should have used ‘i’ instead of ‘self.pos’ as index to fetch data and labels.
So in DeepLearning area, problems comes not only from models and hyper-parameters, but also dataset, or faulty codes…

Fix Resnet-101 model in example of MXNET

SSD(Single Shot MultiBox Detector) is the fastest method in object-detection task (Another detector YOLO, is a little bit slower than SSD). In the source code of MXNET，there is an example for SSD implementation. I test it by using different models: inceptionv3, resnet-50, resnet-101 etc. and find a weird phenomenon: the size .params file generated by resnet-101 is smaller than resnet-50.

Model	Size of .params file
resnet-50	119MB
resnet-101	69MB

Since deeper network have larger number of parameters, resnet-101 has smaller file size for parameters seems suspicious.
Reviewing the code of example/ssd/symbol/symbol_factory.py:

    elif network == 'resnet50':
        num_layers = 50
        image_shape = '3,224,224'  # resnet require it as shape check
        network = 'resnet'
        from_layers = ['_plus12', '_plus15', '', '', '', '']
        num_filters = [-1, -1, 512, 256, 256, 128]
        strides = [-1, -1, 2, 2, 2, 2]
        pads = [-1, -1, 1, 1, 1, 1]
        sizes = [[.1, .141], [.2,.272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
        ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
            [1,2,.5], [1,2,.5]]
        normalizations = -1
        steps = []
        return locals()
    elif network == 'resnet101':
        num_layers = 101
        image_shape = '3,224,224'
        network = 'resnet'
        from_layers = ['_plus12', '_plus15', '', '', '', '']
        num_filters = [-1, -1, 512, 256, 256, 128]
        strides = [-1, -1, 2, 2, 2, 2]
        pads = [-1, -1, 1, 1, 1, 1]
        sizes = [[.1, .141], [.2,.272], [.37, .447], [.54, .619], [.71, .79], [.88, .961]]
        ratios = [[1,2,.5], [1,2,.5,3,1./3], [1,2,.5,3,1./3], [1,2,.5,3,1./3], \
            [1,2,.5], [1,2,.5]]
        normalizations = -1
        steps = []
        return locals()

Why resnet-50 and resnet-101 has the same ‘from_layers’ ? Let’s check these two models:

In resnet-50, the SSD use two layers (as show in red line) to extract features. One from output of stage-3, another from output of stage-4. In resnet-101, it should be the same (as show in blue line), but it incorrectly copy the config code of resnet-50. The correct ‘from_layers’ for resnet-100 is:

from_layers = ['_plus29', '_plus32', '', '', '', '']

This seems like a bug, so I create a pull request to try fixing it.

Training DNN with less memory cost

The paper “Training Deep Nets with Sublinear Memory Cost” tells us a practical method to train DNN with far less memory cost. The mechanism behind is not difficult to understand: when training a deep network (a computing graph), we have to store temporary data in every node, which will occupy extra memory. Actually, we could remove these temporary data after computing each node, and compute them again in back-propagation period. It’s a tradeoff between computing time and computing space.
The author give us an example in MXNET. The improvement of memory-reducing seems tremendous.
Above the version 1.3, tensorflow also brought a similar module: memory optimizer. We can use it like this:

from tensorflow.core.protobuf import rewriter_config_pb2
....
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
config.allow_soft_placement = True
config.graph_options.rewrite_options.memory_optimization = memory_optimization=rewriter_config_pb2.RewriterConfig.MANUAL
with tf.Session(config=config) as sess:
....

Still need to add op in Resnet:

# Add 'reshape' with a special name at the end of every residual-unit
shape = x.get_shape()
dims = []
for i in range(shape.ndims):
    dims.append(shape.dims[i].value)
x = tf.reshape(x, dims, name='robin_137')
....
def ignore(op):
    return op.type in ['Const', 'VariableV2', 'Identity', 'Assign', 'Placeholder', 'RandomShuffleQueueV2', 'QueueEnqueueV2', 'QueueDequeueManyV2']
def checkpoint(op):
    m = re.compile(".*robin_137").match(op.name)
    if m:
        return True
    return False
# Set attribute here
ops = tf.get_default_graph().get_operations()
mirrors = filter(lambda op: not ignore(op) and not checkpoint(op), ops)
for op in mirrors:
    op.node_def.attr['_recompute_hint'].i = 0

By using this method, we could increase batch-size even in deep network (Resnet-101 etc.) now.

Use Mxnet To Classify Images Of Birds (Fourth Episode)

More than half a year past since previous article. In this period, Alan Mei (my old ex-colleague) collected more than 1 million pictures of Chinese Avians. And after Alexnet, VGG19, I finally chose Resnet-18 as my DNN model to classify different kinds of Chinese birds. Resnet-18 model has far less parameters of network than VGG19, but still get enough capability of representation.
Collecting more than 1 million sample pictures of birds and label them (some by the program, and some by hand) is really a tedious job. I really appreciate Alan Mei for accepting so hard a job, although he said he is an Avian fan :). And I also need to thank him for giving me a Personal Computer with GTX970 GPU. Without this GPU, I would not train my model so fast.
To make the accuracy of classifying better, I have read the book “Deep Learning” and many other papers (Not only Resnet-paper, of course). The reward of knowledge about machine learning and deep learning is abundant for me. But the most important of all is: I enjoyed the learning of new technology again.
Today, we launch this simple web: http://en.dongniao.net/ . In the Chinese language, “dongniao” means “Understanding Avians”. Hope the Avian-Fans and Depp Learning Fans will love it.