Monthly Archives: August 2018

Finding core-dump file

In a new server, my program got ‘core dump’. But I haven’t found the core-dump file in the current directory as usual.
First I checked the ‘ulimit’ configuration:

core file size          (blocks, -c) unlimited

Seems ok. The system will generate core-dump file when the program crashed. But where is it?
Eventually, I found out the answer: core-dump file will be generated by following pattern written in /proc/sys/kernel/core_pattern.

cat /proc/sys/kernel/core_pattern
/var/coredump/core-%u-%e-%p-%t

Therefore all the core-dump files sited in /var/coredump/ directory. The pattern setting of ‘core_pattern’ file is explained here.

Migrate blog to AWS’s ec2

My blog had been hosting on Linost since 2013. But recently support staff from Linost noticed me that my site has led CPU usage of the host machine to 100% so the hosting system automatically ‘limited’ my resource, which actually means my site has totally been shut down.
The first thing I want to do is trying to log in my host machine by using SSH. But unfortunately, Linost doesn’t support SSH login. Without SSH and all the Linux commands, how could I find out the problem of high load and resolve it?
Finally, I chose ec2 of AWS for my new hosing machine. In order to reduce the cost, ‘t2.nano’, the cheapest instance type, has been chosen. Although it only has 512MB memory, it’s adequate to run a basic blog on WordPress. Additionally, I bought reserved instance by paying upfront for a whole year. That really decrease the cost further (about 50% discount).
Using ec2 has another advantage: I don’t need to install Mysql/Apache/PHP/Wordpress by myself. With Jetware’s AMI (Amazon Machine Image), a basic WordPress blog could be launched with a few clicks of buttons. Jetware’s AMI uses LEMP (Linux/nginx web Engine/MySQL/PHP) as its basic software stack, and also include myPHPAdmin for management of MySQL. This AMI is totally free. The only small defect is the account of MySQL has been set to an empty password with username ‘root’. But we could fix it by simply:

# Login mysql command line
mysql -uroot
# Set password for root user on 'localhost'
SET PASSWORD FOR root@localhost = PASSWORD('yourpassword');

By typing ‘https://donghao.org/phpmyadmin/’ in the browser, I can manage MySQL so easily:

That’s awesome! Thanks to Jetware.

Source code analysis for Autograd

Autograd is a convenient tool to automatically differentiate native Python and Numpy code.
Let’s look at an example first:

import autograd.numpy as np
from autograd import grad
def f(x):
  return x * x + 1
grad_f = grad(f)
print(grad_f(1.6))

The result is 3.2
f(x) = sqaure(x) + 1, its derivative is 2*x, so the result is correct.
Function grad() actually return a ‘function object’, which is ‘grad_f’. When we call grad_f(1.6), it will ‘trace’ f(x) by:

The ‘fun’ argument is our f(x) function.

In ‘trace()’, it acutually called f() without ‘x’ but a ArrayBox object. The ArrayBox object has two purposes:
1. Go through all the operations in f() along with ‘x’, so it chould get the real result of f(x)
2. Get all the corresponding gradients of operations in f()
ArrayBox class has already override all the basic arithmetic operations, such as add/sustract/multiply/divide/square. Therefore it can catch all the operations in f(x).

After catching all the operations, ArrayBox could lookup the gradients table to get all corresponding gradients, and using chain rule get final gradient result.
The gradients table is showed as below:

Otherwise, Autograd have other tricks to complete its work. Take function wrapper ‘@primitive’ as an example. This decorator make sure users could add new custom-defined-operation into Autograd.
The source code of Autograd is nice and neat. Its examples include fully-connected-network, CNN, even RNN. Let’s take a glimpse of the implement of Adam optimizer of Autograd to feel its concise code style:

Prediction of Red Wine Quality

In Kaggle platform, there is an example dataset about Quality of Red Wine. I wrote some code for it by using scikit-learn and pandas:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
# Read dataset
wine = pd.read_csv('~/Downloads/winequality-red.csv', sep = ';')
attrs = wine.drop(['quality'], axis = 1)
header = list(attrs)
attrs = attrs.values
# Use scaler to normalize data
scaler = StandardScaler()
scaled_attrs = scaler.fit_transform(attrs)
quality = wine['quality'].values
# SVM classifier
svr = SVC(kernel = 'rbf', max_iter = -1)
svr.fit(attrs, quality)
# Randomized decison trees classifier
dt = ExtraTreesClassifier()
dt.fit(attrs, quality)
ls = list(zip(dt.feature_importances_, header))
ls.sort(key = lambda x: x[1])
for importance, name in ls:
    print(name, importance)
print('\n\n')
# Cross validation on this two classifiers
for reg in [svr, dt]:
    scores = cross_val_score(reg, attrs, quality, scoring = 'neg_mean_squared_error', cv = 10)
    rmse = -scores
    print(reg)
    print(rmse.mean(), rmse.std())
    print('\n')

The results reported by snippet above:

alcohol 0.1438906634767823
chlorides 0.07953780339531004
citric acid 0.07979101058207233
density 0.0846765183778148
fixed acidity 0.07686725880938272
free sulfur dioxide 0.07178658192019563
pH 0.07797509374376276
residual sugar 0.0796105749270121
sulphates 0.11872569296381115
total sulfur dioxide 0.0993798893196299
volatile acidity 0.08775891248422625
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)
0.6983420378445301 0.04803296683789781
ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)

Looks the most important feature to predict quality of red wine is ‘alcohol’. Intuitively, right?

Use PCA (Principal Component Analysis) to blur color image

I wrote an example of blurring color picture by using PCA from scikit-learn:

import cv2
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(n_components = 0.96)
img = cv2.imread("input.jpg")
reduced = pca.fit_transform(img)
res = pca.inverse_transform(reduced)
cv2.imwrite('output.jpg', res.reshape(shape))

But it reports

ValueError: Found array with dim 3. Estimator expected <= 2.

The correct solution is transforming image to 2 dimensions shape, and inverse transform it after PCA:

img = cv2.imread('input.jpg')
shape = img.shape
img_r = img.reshape((shape[0], shape[1] * shape[2]))
reduced = pca.fit_transform(img_r)

It works very well now. Let's see the original image and blurring image:

Original Image

Blurring Image

Robin on Linux

Monthly Archives: August 2018