Deep Learing

A stupid mistake in the new deep learning experiment

After my old colleague, JianMei prepared about 1TB data of the birds’ sound records (every mp3 file will be transferred to an image by using spectrogram and split into chunks with each chunk 2.5 seconds period. After all, every file is a 1250×78 multi-dimension array), I started training with almost the same code using in bird image classification.

The train-accuracy rises very slowly so I add a line of code to normalize every input sample:

image = (image - image.mean()) / image.std()

After that, the train-accuracy could rise faster, but the eval-accuracy still quite low.

In order to find out the root of the problem, I started to train from only two classes: “Black-capped Donacobius” and “Blue-eared Barbet”

Eval accuracy:0.965278 | Train accuracy:1.000000

The result seems pretty good. So I increate the number of classes to 20

Eval accuracy:0.211102 | Train accuracy:0.840278
Eval accuracy:0.203834 | Train accuracy:0.885417
Eval accuracy:0.191514 | Train accuracy:0.904247
Eval accuracy:0.193245 | Train accuracy:0.916667
Eval accuracy:0.210894 | Train accuracy:0.916667
Eval accuracy:0.190269 | Train accuracy:0.932292

Are some types of bird hard to generalize in deep learning model? Then I began to consider how to find these “hard to train” bird type: maybe start from 2 classes and increase the number of classes step by step, and then draw a few curves about the train-accuracy and eval-accuracy…

Suddenly I realized that I just use normalization in the training sample but not evaluation sample!

What a stupid mistake. It wasted me a whole stuffy afternoon for nothing. I really should remember this lesson: do what you do in training samples to evaluation samples, except dropout.

Technical Meeting with Nvidia Corporation

Last week I went to Nvidia Corporation of Santa Clara (California) with my colleagues to join a technical meeting about cutting-edge hardware and software of Deep Learning.

The new office building of NVIDIA

At the first day, team leaders from Nvidia introduced their developing plan of new hardware and software. The new hardware are about Tesla V100, NVLink, and HGX (next generation of DGX). And the software is about CUDA-9.2 NCCL-2.0 and TensorRT-3.0
Here are some notes about their introducing:

The next generation of Tesla P4 GPU will have tensor-core, 16GB memory, and H264 decoder (performance as Tesla P100) for better inference performance, especially for image/video processing.
The software support of tensor-core (mainly in Tesla V100 GPU) has been integrated into Tensorflow-1.5 version.
The TensorRT could turn three layers of Deep Learning (Conv layer, Bias layer, Relu layer) to one CBR layer, eliminate concatenation layers, to accelerate inference computing.
The tool ‘nvidia-smi’ could show ‘util’ of GPU. But ‘80%’ utility only means this GPU run task (no matter how many CUDA-cores has been used) for 0.8 seconds in one second period. Therefore it’s not an accurate metrics for real GPU load. NVPROF is the much powerful and accurate tool for profiling of GPU

The TITAN V GPU

At the second day, many teams from Alibaba (my company) ask Nvidia different questions. Here are some questions and answers:

Q: Some Deep Learning Compilers such as XLA (Google) and TVM(from AWS) could compile python code to GPU intermediate representation directly. How will Nvidia work with these application-oriented compilers?
A: The google XLA team will be shut off and move to optimize TPU performance only. Nvidia will still focus on a library such as CUDA/cuDNN/TensorRT and will not build frameworks like Tensorflow or Mxnet.

Q: There are many new types of hardware launched for Deep Learning: Google’s TPU, some ASICs developed by other companies. How will Nvidia keep cost performance over these new competitors?
A: ASICs are not programmable. If models of Deep Learning changes, the ASIC will be in the trash. For example, TPU has Relu/Conv instructions, but if it comes to a new type of activation function, it will not work anymore. Furthermore, customers can only run TPU on Google’s cloud, which means they have to put their data on the cloud, without other choices.

The DGX server

We also visited the Demo Room of Nvidia’s state-of-art hardware for auto-driving and deep learning. It was an effective meeting, and we learn a lot.

The car of auto-driving testing platform

I am standing before the NVIDIA logo

Use Mxnet To Classify Images Of Birds (Fourth Episode)

More than half a year past since previous article. In this period, Alan Mei (my old ex-colleague) collected more than 1 million pictures of Chinese Avians. And after Alexnet, VGG19, I finally chose Resnet-18 as my DNN model to classify different kinds of Chinese birds. Resnet-18 model has far less parameters of network than VGG19, but still get enough capability of representation.
Collecting more than 1 million sample pictures of birds and label them (some by the program, and some by hand) is really a tedious job. I really appreciate Alan Mei for accepting so hard a job, although he said he is an Avian fan :). And I also need to thank him for giving me a Personal Computer with GTX970 GPU. Without this GPU, I would not train my model so fast.
To make the accuracy of classifying better, I have read the book “Deep Learning” and many other papers (Not only Resnet-paper, of course). The reward of knowledge about machine learning and deep learning is abundant for me. But the most important of all is: I enjoyed the learning of new technology again.
Today, we launch this simple web: http://en.dongniao.net/ . In the Chinese language, “dongniao” means “Understanding Avians”. Hope the Avian-Fans and Depp Learning Fans will love it.

Robin on Linux

Deep Learing