Prediction of Red Wine Quality

In Kaggle platform, there is an example dataset about

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
import pandas as pd
 
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
 
# Read dataset
wine = pd.read_csv('~/Downloads/winequality-red.csv', sep = ';')
attrs = wine.drop(['quality'], axis = 1)
header = list(attrs)
attrs = attrs.values
 
# Use scaler to normalize data
scaler = StandardScaler()
scaled_attrs = scaler.fit_transform(attrs)
 
quality = wine['quality'].values
 
# SVM classifier
svr = SVC(kernel = 'rbf', max_iter = -1)
svr.fit(attrs, quality)
 
# Randomized decison trees classifier
dt = ExtraTreesClassifier()
dt.fit(attrs, quality)
 
ls = list(zip(dt.feature_importances_, header))
ls.sort(key = lambda x: x[1])
for importance, name in ls:
    print(name, importance)
 
print('\n\n')
 
# Cross validation on this two classifiers
for reg in [svr, dt]:
    scores = cross_val_score(reg, attrs, quality, scoring = 'neg_mean_squared_error', cv = 10)
    rmse = -scores
    print(reg)
    print(rmse.mean(), rmse.std())
    print('\n')

The results reported by snippet above:

Looks the most important feature to predict quality of red wine is ‘alcohol’. Intuitively, right?

Use PCA (Principal Component Analysis) to blur color image

I wrote an example of blurring color picture by using PCA from scikit-learn:

But it reports

The correct solution is transforming image to 2 dimensions shape, and inverse transform it after PCA:

It works very well now. Let’s see the original image and blurring image:



Original Image



Blurring Image