May 2017 – Robin on Linux

Read paper “Large-Scale Machine Learning with Stochastic Gradient Descent”

Paper reference: Large-Scale Machine Learning with Stochastic Gradient Descent

This GD(Gradient Descent), which is used for computing weight of NN (also used for other Machine Learning Algorithm). z_i represents the example ‘i’, also as (x_i, y_i). After calculate all examples, we need to compute the average for all differentials by weight. Calculating all examples is a slow progress, so we can image GD is not adequate efficient.

SGD

Here comes the SGD, which use only one example to compute gradient. It is simpler, and more efficient.

k-mean

Using SGD in K-mean clustering algorithm seems counterintuitive for me at first glance. But after thinking about “Sample z_i belongs to cluster of w_k, then don’t wait for all samples, just update w_k by z_i“, it becomes conceivable.

ASGD

ASGD is suitable for distributed machine learning environment, since it could get averaged gradient from any example of data at any time (no order restrain).

Data Preprocessing in Tableau

In previous article, I created two tables in my Redshift Cluster. Now I wan’t to find out the relation between salary of every employee and their working age. Tableau is the best choice for visualizing data analysis (SAS is too expensive and has no trail-version for learning).
First, we connect to Redshift in Tableau, and double-click the “New Custom SQL”. In the popup window, type in our SQL to query first-year-salary of every employee:

Now we have the table “custom sql query”. Drag in table “salary”, and choose “inner join” for employee_id, start_date:

Click into the “Sheet 1”. Drag “salary” to “Rows”, “min_start_date” to “Columns”, and “employee_id” to “Color” in “Marks” panel.

Now we can see the “expensive employees” (who have the most high salary in the same first-year) on the top of the graph:

Instead of adding custom SQL in tableau datasource panel, we can also create view in Redshift, and let tableau show views in “Tables”.

CREATE VIEW ts
AS SELECT employee_id, MIN(start_date) AS min_start_date
FROM salary
GROUP BY employee_id;
CREATE VIEW first_year_salary
AS SELECT ts.employee_id, s.salary, ts.min_start_date
FROM ts
JOIN salary AS s
ON ts.employee_id = s.employee_id AND ts.min_start_date = s.start_date;

Or using “WITH” clause

WITH ts
AS (SELECT employee_id, MIN(start_date) AS min_start_date
FROM salary
GROUP BY employee_id)
SELECT ts.employee_id, s.salary, ts.min_start_date
FROM ts
JOIN salary AS s
ON ts.employee_id = s.employee_id AND ts.min_start_date = s.start_date;

Robin on Linux

Monthly Archives: May 2017