Prediction of Red Wine Quality

In Kaggle platform, there is an example dataset about

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
# Read dataset
wine = pd.read_csv('~/Downloads/winequality-red.csv', sep = ';')
attrs = wine.drop(['quality'], axis = 1)
header = list(attrs)
attrs = attrs.values
# Use scaler to normalize data
scaler = StandardScaler()
scaled_attrs = scaler.fit_transform(attrs)
quality = wine['quality'].values
# SVM classifier
svr = SVC(kernel = 'rbf', max_iter = -1), quality)
# Randomized decison trees classifier
dt = ExtraTreesClassifier(), quality)
ls = list(zip(dt.feature_importances_, header))
ls.sort(key = lambda x: x[1])
for importance, name in ls:
    print(name, importance)
# Cross validation on this two classifiers
for reg in [svr, dt]:
    scores = cross_val_score(reg, attrs, quality, scoring = 'neg_mean_squared_error', cv = 10)
    rmse = -scores
    print(rmse.mean(), rmse.std())

The results reported by snippet above:

Looks the most important feature to predict quality of red wine is ‘alcohol’. Intuitively, right?

Use PCA (Principal Component Analysis) to blur color image

I wrote an example of blurring color picture by using PCA from scikit-learn:

But it reports

The correct solution is transforming image to 2 dimensions shape, and inverse transform it after PCA:

It works very well now. Let’s see the original image and blurring image:

Original Image

Blurring Image

Do tf.random_crop() operation on GPU

When I run code like:

it reports:

Looks operation tf.random_crop() doen’t have CUDA kernel implementation. Therefore I need to write it myself. The solution is surprisingly simple: write a function to do random_crop on one image by using tf.random_uniform() and tf.slice(), and then use tf.map_fn() to apply it on multi-images.

It can run on GPU now.

Regularization loss in ‘slim’ library of Tensorflow

My python code using slim library to train classification model in Tensorflow:

It works fine. However, no matter what value the ‘weight_decay’ is, the training accuracy of the model could reach higher than 90% easily. It seems ‘weight_decay’ just doesn’t work.
In order to find out the reason, I reviewed the code of Tensorflow for ‘tf.losses.sparse_softmax_cross_entropy()’:

The ‘losses.sparse_softmax_cross_entropy()’ simply call ‘tf.nn.sparse_softmax_cross_entropy()’. Then let’s look into the implementation of ‘compute_weighted_loss()’:

The losses of ‘losses.sparse_softmax_cross_entropy()’ will be added into collection of ‘GraphKeys.LOSSES’. Then where dose the weight of parameters go ? Will they be added into same collection ? Let’s check. All the layer written by library of ‘tf.layers’ or ‘tf.contrib.slim’ are inherited from ‘class Layer’ and will call ‘add_loss()’ when this layer call ‘add_variable()’. Let’s check ‘add_loss()’ of base class ‘Layer’:

It’s weird. The loss from weight of variable has not been added into ‘GraphKeys.LOSSES’, but ‘GraphKeys.REGULARIZATION_LOSSES’. Then how could we get all the losses at training stage ? After grep ‘REGULARIZATION_LOSSES’ in whole codes of Tensorflow, it comes up with the ‘get_total_loss()’:

That is the secret of losses in ‘tf.layers’ and ‘tf.contrib.slim’: we should use ‘get_total_loss()’ to fetch model loss and regularization loss together!
After changing my code:

The ‘weight_decay’ works well now (which means training accuracy could not reach high value easily)

Using multi-GPUs for training in distributed environment of Tensorflow

I am trying to write code for training on multi-GPUs. The code is mainly from the example of ‘Distributed Tensorflow‘. I have changed the code slightly for runing on GPU:

But after launch the script below:

it reports:

Seems one MonitoredTrainingSession will occupy all the memory of GPUs. After search on google, I finally get a solution: ‘CUDA_VISIBLE_DEVICES’.
Firstly, change ‘replica_device_setter’:

and then use this shell script to launch training processes:

The ‘ps’ will only use GPU0, ‘worker0’ will only use GPU1, ‘worker1’ will only use GPU2 etc.

Reinforcement Learning example for tree search

I have been learning Reinforcement Learning for about two weeks. Although haven’t go through all the course of Arthur Juliani, I had been able to write a small example of Q-learning now.
This example is about using DNN for Q-value table to solve a path-finding-problem. Actually, the path is more looks like a tree:

The start point is ‘0’, and the destination (or ‘goal’) is ’12’.

The code framework of my example is mainly from Manuel Amunategui’s tutorial but replacing Q-value table with a one-layer-neural-network.

The rewards curve in training steps:

And this example will finally report:

which is the correct answer.

Problems and solutions about building Tensorflow-1.8 with TensorRT 4.0

When compiling Tensorflow-1.8 with CUDA-9.2, it reports:

Add ‘/usr/local/cuda-9.2/lib64’ into ‘/etc/’ and run ‘sudo ldconfig’ to make it works.

When compiling Tensorflow-1.8, it reports:

In ‘.tf_configure.bazelrc’ file, use real python location instead of soft link:

When running TensorRT, it reports:

Run TensorRT with LD_LIBRARY_PATH:

Testing performance of Tensorflow’s fixed-point-quantization on x86_64 cpu

Google has published their quantization method on this paper. It use int8 to run feed-forward but float32 for back-propagation, since back-propagation need more accurate to accumulate gradients. I got a question right after reading the paper: why all the performance test works are on platform of mobile-phone (ARM architecture)? The quantization consequences of model in google’s method doesn’t only need addition and multiplication of int8 numbers, but also bit-shift operations. The AVX instruments set in Intel x86_64 architecture could accelerate MAC (Multiplication, Addition and aCcumulation), but couldn’t boost bit-shift operations.

To verify my suspicion, I wrote a model with ResNet-50 (float32) to classify CIFAR-100 dataset. After running a few epochs, I evaluate the speed of inference by using my ‘’. The result is:

Then, I follow these steps to add tf.contrib.quantize.create_training_graph() and tf.contrib.quantize.create_eval_graph() into my code. This time, the speed of inference is:

A little bit of disappointment. Using quantized (int8) version of model could not accelerate processing speed of x86 CPU. May be we need to find other more powerful quantization algorithm.


Some tips about LaTeX

1. After running ‘bibtex paper’, it reports

This is because we need to use ‘and’ to replace commas. After changing them

The errors disappeared.

2. How to extend space between two rows in a table?

3. Problem: Can’t upload .bib file in
Answer: run ‘pdflatex paper’ to generate paper.aux from paper.tex, and then run ‘bibtex paper’ to convert paper.bib to paper.bbl. Now we could upload .bbl file to arXiv.

4. Problem: When select ‘Tools’–>’Check Spelling…’ in texStudio, it report “No dictionary Available”.
Answer: Download english dictionary from, change suffix from ‘oxt’ to ‘zip’ and unzip it. In ‘preferences’ of texStudio, set dictionary path to the unzip directory. (ref)

After solved all these problems, I eventually submit my paper here:

Use pandas and matplotlib to draw line chart

I have two CSV files. Their content looks like:

The simplest way to load and draw them is by using pandas and matplotlib.

The figure draw out by this snippet is shown below: