tensorflow

Some tips about Python, Pandas, and Tensorflow

There are some useful tips for using Keras and Tensorflow to build models.
1. Using applications.inception_v3.InceptionV3(include_top = False, weights = ‘Imagenet’) to get pretrained parameters for InceptionV3 model, the console reported:

Exception: URL fetch failure on https://github.com/fchollet/deep-learning-models/releases/download/v0.5/inception_v3_weights_tf_dim_ordering_tf_kernels_notop.h5: None -- unknown url type: https

The solution is here. Just install some packages:

Sudo yum install openssl openssl-devel -y

2. Could we use ‘add’ to merge two DataFrames of Pandas? Let’s try

import pandas as pd
a = pd.DataFrame([[1, 2], [3, 4]], columns = ['first', 'second'])
b = pd.DataFrame([], columns = ['first', 'second'])
print(a+b)

The result is:

  first second
0   NaN    NaN
1   NaN    NaN

The operator ‘+’ just works as ‘pandas.DataFrame.add‘. It try to add all values column by column, but the second DataFrame is empty, so the result of adding a number and a nonexistent value is ‘Nan’.
To merge two DataFrames, we should use ‘append’:

a.append(b)

3. Why Estimator of Tensorflow doesn’t print out log?

    logging_hook = tf.train.LoggingTensorHook({'step': global_step, 'loss': loss, 'precision': precision, 'recall': recall, 'f1': f1, 'auc': auc},
            every_n_iter = every_n_iter, formatter = formatter_log)
    ....
    if mode == tf.estimator.ModeKeys.EVAL:
        return tf.estimator.EstimatorSpec(mode, loss = loss, evaluation_hooks = [logging_hook])

But the logging_hook hasn’t been run. The solution is just adding one line before running Estimator:

tf.logging.set_verbosity(tf.logging.INFO)

Some errors in dataset pipeline of Tensorflow

To extend image datasets by using mixup，I use this snippet to mix two images:

major_image = cv2.imread('1.jpeg')
minor_image = cv2.imread('2.jpeg')
new_image = major_image * 0.9 + minor_image * 0.1

But after generating images by using this snippet, the training report errors:

Traceback (most recent call last):
  File "/home/tops/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call
    return fn(*args)
  File "/home/tops/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/tops/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]:
 [8388608], [batch]: [1048576]
         [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[?,?], [?,?], [?]], output_types=[DT_INT64, DT_UINT8, DT_STRING], _device="/job:l
ocalhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "train.py", line 282, in 
    main()
  File "train.py", line 278, in main
    train(config)
  File "train.py", line 224, in train
    labels_r, images_r, ids_r = sess.run([labels, images, ids])
  File "/home/tops/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 887, in run
    run_metadata_ptr)
  File "/home/tops/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1110, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/tops/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1286, in _do_run
    run_metadata)
  File "/home/tops/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1308, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Cannot add tensor to the batch: number of elements does not match. Shapes are: [tensor]:
 [8388608], [batch]: [1048576]
         [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[?,?], [?,?], [?]], output_types=[DT_INT64, DT_UINT8, DT_STRING], _device="/job:l
ocalhost/replica:0/task:0/device:CPU:0"](IteratorV2)]]

The size of each image is 512x512x4 = 1048576 bytes. But I can’t understand why there is image has the size of 8388608 bytes.
Firstly my suspected point is the dataset flow of Tensorflow. But after changing the code of dataset pipeline, I find the problem is not in Tensorflow.
Again and again, I reviewed my code of generating new images and also adding some debug stub. Finally, I found out the problem: it’s not Tensorflow’s fault, but mine.
By using

new_image = major_image * 0.9 + minor_image * 0.1

The type of ‘new_image’ is ‘float64’, not ‘uint8’ for ‘major_image’ and ‘minor_image’! The ‘float64’ use 8 bytes to store one element, so this explains the ‘8388608’ in error information.
To correctly mixup images, the code should be:

major_image = cv2.imread('1.jpeg')
minor_image = cv2.imread('2.jpeg')
new_image = major_image * 0.9 + minor_image * 0.1
new_image = new_image.astype(np.uint8)

Books I read in year 2018

In the 2018 year, I continued to learn more knowledge about machine learning and deep Learning. “Deep Learning” is pretty suitable for me and “Hands-On Machine Learning with Scikit-Learn and TensorFlow” is also a wonderful supplement for programming practice. I also learned some basic knowledge about Reinforcement learning.
To teach my daughters programming, I read some books about Arduino. In the process of learning Arduino, I became more and more interested in electronics on myself! After reading more technical documents about electronics (diode, transistor, capacitor, relay, thyristor etc.), Microcontrollers (Atmega from Atmel, MSP430 from Texas Instruments, STM8 from ST and so on), I had opened my view to a new area.
History books are always my favorite type. The most astonishing history book I have read in 2018 is “The Last Panther”. This book tells us an extremely cruel but real story in WWII.
Kazuo Inamori is a famous entrepreneur in Japan. I read some books written by him at the end of this year. Surprisingly, his books definitely inspired me and even changed some parts of my mind. I really want to thank him for his teaching.

Compare implementation of tf.AdamOptimizer to its paper

When I reviewed the implementation of Adam optimizer in tensorflow yesterday, I noticed that it’s code is different from the formulas that I saw in Adam’s paper. In tensorflow’s formulas for Adam are:

But the algorithm in the paper is:

Then quickly I found these words in the document of tf.AdamOptimizer:

Note that since AdamOptimizer uses the formulation just before Section 2.1 of the Kingma and Ba paper rather than the formulation in Algorithm 1, the “epsilon” referred to here is “epsilon hat” in the paper.

And this time I did find the ‘Algo 2’ in the paper:

But how does ‘Algo 1’ tranform to ‘Algo 2’? Let me try to deduce them from ‘Algo 1’:

$\theta_t \gets \theta_{t-1} - \frac{\alpha \cdot \hat{m_t}}{(\sqrt{\hat{v_t}} + \epsilon)}$
$\implies \theta_t \gets \theta_{t-1} - \alpha \cdot \frac{m_t}{1 - \beta_1^t} \cdot \frac{1}{(\sqrt{\hat{v_t}} + \epsilon)} \quad \text{ (put } \hat{m_t} \text{ in) }$
$\implies \theta_t \gets \theta_{t-1} - \alpha \cdot \frac{m_t}{1 - \beta_1^t} \cdot \frac{\sqrt{1-\beta_2^t}}{\sqrt{v_t}} \quad \text{ (put } \hat{v_t} \text{ in and ignore } \epsilon \text{) }$
$\implies \theta_t \gets \theta_{t-1} - \alpha_t \cdot \frac{m_t}{\sqrt{v_t} + \hat{\epsilon}} \quad \text{add new } \hat{\epsilon} \text { to avoid zero-divide}$

The bug about using hooks and MirroredStrategy in tf.estimator.Estimator

When I was using MirroedStrategy in my tf.estimator.Estimator:

distribution = tf.contrib.distribute.MirroredStrategy(
      ["/device:GPU:0", "/device:GPU:1"])
config = tf.estimator.RunConfig(train_distribute=distribution,
                                  eval_distribute=distribution)
estimator = tf.estimator.Estimator(
      model_fn=build_model_fn_optimizer(), config=config)
estimator.train(input_fn=input_fn, steps=10)

and add hooks for training:

logging_hook = tf.train.LoggingTensorHook({'logits' : logits})
    return tf.estimator.EstimatorSpec(mode, loss=loss_fn(), train_op=train_op, training_hooks = [logging_hook])

The tensorflow report errors:

  File "/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 356, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1179, in _train_model
    return self._train_model_distributed(input_fn, hooks, saving_listeners)
  File "/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1309, in _train_model_distributed
    grouped_estimator_spec.training_hooks)
  File "/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1305, in get_hooks_from_the_first_device
    for per_device_hook in per_device_hooks
  File "/usr/lib/python3.6/site-packages/tensorflow/python/estimator/estimator.py", line 1305, in 
    for per_device_hook in per_device_hooks
AttributeError: 'Estimator' object has no attribute '_distribution'

Without finding any answers on google, I have to look into the code of ‘estimator.py’ in tensorflow. Fortunately, the code defect is obvious:

        scaffold = _combine_distributed_scaffold(
            grouped_estimator_spec.scaffold, self._train_distribution)
        # TODO(yuefengz): add a test for unwrapping per_device_hooks.
        def get_hooks_from_the_first_device(per_device_hooks):
          return [
              self._distribution.unwrap(per_device_hook)[0]
              for per_device_hook in per_device_hooks
          ]
        training_hooks = get_hooks_from_the_first_device(
            grouped_estimator_spec.training_hooks)

class Estimator havn’t any private argument named ‘_distribution’ but only have ‘_train_distribution’ and ‘_eval_distribution’. So the fix is just change ‘self._distribution.unwrap(per_device_hook)[0]’ to ‘self._train_distribution.unwrap(per_device_hook)[0]’.
I had submitted a request pull for tensorflow to fix this bug in branch 1.11

How Tensorflow set device for each Operation ?

In Tensorflow, we only need to use snippet below to assign a device to a Operation:

with tf.device('/GPU:0'):
  ...
  result = tf.matmul(a, b)

How dose it implement? Let’s take a look.
There is a mechanism called ‘context manager’ in Python. For example, we can use it to add a wrapper for a few codes:

from contextlib import contextmanager
@contextmanager
def tag(name):
  print("[%s]" % name)
  yield
  print("[/%s]" % name)
with tag("robin"):
  print("what")
  print("is")
  print("nature's")

The result of running this script is:

[robin]
what
is
nature's
[/robin]

Function ‘tag()’ works like a decorator. It will do something before and after those codes laying under its ‘context’.
Tensorflow uses the same principle.

@tf_export("device")
def device(device_name_or_function):
...
  if context.executing_eagerly():
    # TODO(agarwal): support device functions in EAGER mode.
    if callable(device_name_or_function):
      raise RuntimeError(
          "tf.device does not support functions when eager execution "
          "is enabled.")
    return context.device(device_name_or_function)
  else:
    return get_default_graph().device(device_name_or_function)

This will call class Graph’s function ‘device()’. Its implementation:

@tf_export("GraphKeys")
class GraphKeys(object):
...
  @tf_contextlib.contextmanager
  def device(self, device_name_or_function):
  ...
      self._add_device_to_stack(device_name_or_function, offset=2)
    try:
      yield
    finally:
      self._device_function_stack.pop_obj()

The key line is ‘self._add_device_to_stack()’. Context of ‘device’ will add device name into stack of python, and when developer create an Operation it will fetch device name from stack and set it to this Operation.
Let’s check the code routine of creating Operation:

@tf_export("GraphKeys")
class GraphKeys(object):
...
  def create_op(
      self,
      op_type,
      inputs,
      dtypes,  # pylint: disable=redefined-outer-name
      input_types=None,
      name=None,
      attrs=None,
      op_def=None,
      compute_shapes=True,
      compute_device=True):
  ...
    with self._mutation_lock():
      ret = Operation(
          node_def,
          self,
          inputs=inputs,
          output_types=dtypes,
          control_inputs=control_inputs,
          input_types=input_types,
          original_op=self._default_original_op,
          op_def=op_def)
      self._create_op_helper(ret, compute_device=compute_device)
    return ret

  def _create_op_helper(self, op, compute_device=True):
  ...
    if compute_device:
      self._apply_device_functions(op)

  def _apply_device_functions(self, op):
  ...
    for device_spec in self._device_function_stack.peek_objs():
      if device_spec.function is None:
        break
      op._set_device(device_spec.function(op))
    op._device_code_locations = self._snapshot_device_function_stack_metadata()

‘self._device_function_stack.peek_objs’ is where it peek the device name from stack.

Some tips about using google’s TPU (Cont.)

Sometimes I get this error from TPUEstimator:

...
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.DeadlineExceededError: Deadline Exceeded

And after stop and restart TPU in console of GCP, the error disappeared. TPU doesn’t allow users to use it directly like GPU. You can’t see the device in VM looks like ‘/dev/tpu’ or something like this. Google provides TPU as RPC service, so you can only run DNN training through this service. I think this RPC service is not stable enough so sometimes it can’t work and lead to the error ‘Deadline Exceeded’.
When I get this type of error from TPU:

2018-09-29 01:57:12.779430: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:349] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.

The only solution is to create a new TPU instance and delete the old one in GCP console. Seems Google need to improve the robust of their TPU RPC service.
Running 10000 steps and get ‘loss’ for every turn:

INFO:tensorflow:Loss for final step: 3.2015076.
INFO:tensorflow:Loss for final step: 2.5733204.
INFO:tensorflow:Loss for final step: 1.8888541.
INFO:tensorflow:Loss for final step: 2.3713436.
INFO:tensorflow:Loss for final step: 2.9957836.
INFO:tensorflow:Loss for final step: 1.3974692.
INFO:tensorflow:Loss for final step: 1.3933656.
INFO:tensorflow:Loss for final step: 2.3544135.
INFO:tensorflow:Loss for final step: 1.9383199.
INFO:tensorflow:Loss for final step: 2.0213509.
INFO:tensorflow:Loss for final step: 1.8641331.
INFO:tensorflow:Loss for final step: 1.6767861.
INFO:tensorflow:Loss for final step: 2.63849.
INFO:tensorflow:Loss for final step: 2.19468.
INFO:tensorflow:Loss for final step: 1.9854712.
INFO:tensorflow:Loss for final step: 1.9380764.
INFO:tensorflow:Loss for final step: 0.97299415.
INFO:tensorflow:Loss for final step: 2.089243.
INFO:tensorflow:Loss for final step: 2.1150723.
INFO:tensorflow:Loss for final step: 1.8242038.
INFO:tensorflow:Loss for final step: 2.8426473.

It’s quite strange that the ‘loss’ can’t go low enough. I still need to do more experiments.
Previously, I run MobileNet_v2 in a machine with Geforce GTX 960 and it could process 100 samples per second. And by using 8 TPUs of Version 2, it can process about 500 samples per second. Firstly, I am so disappointed about the performance-boosting of TPUv2, for it only has about 1.4 TFLOPS for each. But then I noticed that may be the bottleneck is not the performance of TPU, since IO is usually the limit for training speed. Besides, my model is MobileNet_v2, which is too simple and light so it can’t excavate all the capability of TPU.
Therefore I set ‘depth_multiplier=4’ for MobileNet_v2. Under this model, GTX 960 could process 21 samples per second, and TPUv2-8 could process 275 samples per second. This time, I can estimate each TPUv2 has about 4 TFLOPS. I know this metric seems too low from Google’s official 45 TFLOPS. But considering the possible bottlenecks of storage IO and network bandwidth, it becomes understandable. And also, there is another possibility: Google’s 45 TFLOPS means the half-precision operation performance 🙂

Google has just release Tensorflow 1.11 for TPU clusters. At first, I think I can use hooks in TPUEstimatorSpec now, but after adding
def model_fn():
    ...
    logging_hook = tf.train.LoggingTensorHook({'loss': loss}, every_n_iter = 100)
    return tf.contrib.tpu.TPUEstimatorSpec(mode, loss = loss, training_hooks = [logging_hook], train_op = train_op)
it reports
INFO:tensorflow:Error recorded from training_loop: Operation u'total_loss' has been marked as not fetchable.
Certainly, the TPU is much harder to use and debug than GPU/CPU.

Some tips about using google’s TPU

About one month ago, I submit a request to Google Research Cloud for using TPU for free. Fortunately, I received the approvement yesterday. The approvement let me use 5 regular Cloud TPUs and 100 preemptible Cloud TPUs for free for 30 days with only submitting my GCP project name to it.
Then I have to change my previous Tensorflow program to let it run on TPUs. I can’t just change tf.device(‘/gpu:0’) to ‘tf.device(‘/tpu:0’) in code to run training on Google TPU. Actually, there are many documents about how to modify the code for this, such as TPUEstimator, Using TPUs etc.
Here are some tips about porting code for TPUs:
1. We can only use TPUEstimator for training

        classifier = tf.contrib.tpu.TPUEstimator(
                model_fn = model_wrapper,
                config = run_config,
                use_tpu = FLAGS.use_tpu,
                train_batch_size = 64,
                batch_axis = [0, 0],
                params = {'optimizer': opt})

Pay attention to the ‘batch_axis’. It tells TPU pod to split data by ‘0’ dimension for data and labels, for I use ‘NHWC’ data format.
2. model_fn and data_input_fn in TPUEstimator has arguments more than regular tf.estimator.Estimator. We need to fetch some arguments (‘batch_size’) from params.

def data_input_fn(params):
    batch = params['batch_size']
...
def model_fn(features, labels, mode, config, params):
...

3. TPU doesn’t support the operation like

images = tf.contrib.image.rotate(images, tf.random_uniform([1], minval = -math.pi / 4.0, maxval = math.pi / 4.0))

So try to avoid using them
4. Carefully use tf.dataset or else it will report data shape error. The code below could run correctly so far

  dataset = files.apply(tf.contrib.data.parallel_interleave(tf.data.TFRecordDataset, sloppy = True, cycle_length = buff_size))
  dataset = dataset.map(_parse_function)
  dataset = dataset.repeat()
  dataset = dataset.apply(tf.contrib.data.batch_and_drop_remainder(batch_size))
  dataset = dataset.shuffle(batch_size * buff_size)
  iterator = dataset.make_initializable_iterator()

5. Because using TPUEstimator, we can’t init iterator of tf.dataset in ‘session.run()’, so a little trick should be used:

def data_input_fn():
    ...
    tf.add_to_collection(tf.GraphKeys.TABLE_INITIALIZERS, it.initializer)
    ...

6. The Tensorflow in GCP VM instance only supports loading datasets from and storing model into GCP storage.

    run_config = tf.contrib.tpu.RunConfig(
            master = master,
            evaluation_master = master,
            model_dir = 'gs://my-project/models/',
            session_config = tf.ConfigProto(
                allow_soft_placement = True, log_device_placement = True),
            tpu_config = tf.contrib.tpu.TPUConfig(
                FLAGS.iterations, FLAGS.num_shards)
        )

7. There aren’t any hooks for TPUEstimator currently in Tensorflow-1.9. So I can’t see any report from console after launching a TPU program. Hope Google could improve it as soon as possible.

Some modifications about SSD-Tensorflow

In the previous article, I introduced a new library for Object Detection. But yesterday, after I added slim.batch_norm() into ‘nets/ssd_vgg_512.py’ like this:

def ssd_arg_scope(weight_decay=0.0005, data_format='NHWC', is_training=False):
...
    with slim.arg_scope([slim.conv2d, slim.fully_connected],
                        activation_fn=tf.nn.relu,
                        weights_regularizer=slim.l2_regularizer(weight_decay),
                        weights_initializer=tf.contrib.layers.xavier_initializer(),
                        biases_initializer=tf.zeros_initializer(),
                        normalizer_fn=slim.batch_norm):
        with slim.arg_scope([slim.conv2d, slim.max_pool2d],
                            padding='SAME',
                            data_format=data_format):
            with slim.arg_scope([custom_layers.pad2d,
                                 custom_layers.l2_normalization,
                                 custom_layers.channel_to_last],
                                 data_format=data_format):
                with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training) as sc:
                    return sc

Although training could still run correctly, the evaluation reported errors:

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [128] rhs shape= [256]
         [[Node: save/Assign_112 = Assign[T=DT_FLOAT, _class=["loc:@ssd_512_vgg/conv2/conv2_2/BatchNorm/moving_variance"], use_locking=true, validate_
shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](ssd_512_vgg/conv2/conv2_2/BatchNorm/moving_variance, save/RestoreV2/_283)]]

I wondered why adding some simple batch_norm will make shape incorrect for quite a while. Finally I find this page from google. It said this type of error is usually made by incorrect data_format setting. Then I check the code of ‘train_ssd_network.py’ and ‘eval_ssd_network.py’, and got the answer: the training code use ‘NCHW’ but evaluating code use ‘NHWC’!
After changing data_format to ‘NCHW’ in ‘eval_ssd_network.py’, the evaluation script runs successfully.

Choosing a Object Detection Framework written by Tensorflow

Recently I need to train a DNN model for object detection task. In the past, I am using the object detection framework from tensorflows’s subject — models. But there are two reasons that I couldn’t use it to do my own project:
First, it’s too big. There are more than two hundred python files in this subproject. It contains different types of models such as Resnet, Mobilenet and different type of detection algorithms such as RCNN and SSD (Single Shot Detection). Enormous codes usually mean hard to understand and customize. It might cost a lot of times to build a new project by referencing a big old project.
Second, it uses too much configuration files hence even trying to understanding these configs would be a tedious work.
Then I find another better project in GitHub: SSD-Tensorflow. It’s simple. It only includes less than fifty Python files. You can begin training only by using one command:

DATASET_DIR=./train_tfrecords
TRAIN_DIR=./logs/
CHECKPOINT_PATH=./logs/
CUDA_VISIBLE_DEVICES=0,1 python train_ssd_network.py \
  --train_dir=${TRAIN_DIR} \
  --dataset_dir=${DATASET_DIR} \
  --dataset_name=pascalvoc_2007 \
  --dataset_split_name=train \
  --model_name=ssd_512_vgg \
  --checkpoint_path=${CHECKPOINT_PATH} \
  --save_summaries_secs=60 \
  --save_interval_secs=600 \
  --weight_decay=0.0005 \
  --optimizer=rmsprop \
  --learning_rate=0.001 \
  --end_learning_rate=0.000001 \
  --learning_rate_decay_factor=0.96 \
  --num_epochs_per_decay=5.0 \
  --batch_size=8 \
  --gpu_memory_fraction=1.0 \
  --num_clones=2

No configuration files, no ProtoBuf formats.
Although simple and easy to use, this project still has the capability to be remolded. We can add new dataset by changing the code in directory ‘/datasets’ and add new networks in directory ‘/nets’. Let’s see the basic implementation of its VGG for SSD:

# nets/ssd_vgg_512.py
    with tf.variable_scope(scope, 'ssd_512_vgg', [inputs], reuse=reuse):
        # Original VGG-16 blocks.
        net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
        end_points['block1'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool1')
        # Block 2.
        net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
        end_points['block2'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool2')
        # Block 3.
        net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
        end_points['block3'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool3')
        # Block 4.
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
        end_points['block4'] = net
        net = slim.max_pool2d(net, [2, 2], scope='pool4')
        # Block 5.
        net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
        end_points['block5'] = net
        net = slim.max_pool2d(net, [3, 3], 1, scope='pool5')
...

Quit straight, isn’t it?