Read paper “Large-Scale Machine Learning with Stochastic Gradient Descent”

Paper reference: Large-Scale Machine Learning with Stochastic Gradient Descent

This GD(Gradient Descent), which is used for computing weight of NN (also used for other Machine Learning Algorithm). z_i represents the example ‘i’, also as (x_i, y_i). After calculate all examples, we need to compute the average for all differentials by weight. Calculating all examples is a slow progress, so we can image GD is not adequate efficient.

SGD

Here comes the SGD, which use only one example to compute gradient. It is simpler, and more efficient.

k-mean

Using SGD in K-mean clustering algorithm seems counterintuitive for me at first glance. But after thinking about “Sample z_i belongs to cluster of w_k, then don’t wait for all samples, just update w_k by z_i“, it becomes conceivable.

ASGD

ASGD is suitable for distributed machine learning environment, since it could get averaged gradient from any example of data at any time (no order restrain).

Robin on Linux

Read paper “Large-Scale Machine Learning with Stochastic Gradient Descent”

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply