Monthly Archives: June 2017

Using Spark-SQL to transfer CSV file to Parquet

After downloading data from “Food and Agriculture Organization of United Nations”, I get many CSV files. One of the file is named “Trade_Crops_Livestock_E_All_Data_(Normalized).csv” and it looks like:

To load this CSV file into Spark and dump it to Parquet format, I wrote these codes:

The build.sbt is

Read more »

Importance of function’s return Type in Scala

The Scala code is below:

But the output of this section of code is:

Nothing, just a pair of parenthesis. What happen? Why doesn’t Scala use the ‘call’ function in subclass ‘Student’? The answer is: we forget to define the return Type of function ‘call’, so its default… Read more »

Some tips about “Amazon Redshift Database Developer Guide”

Show diststyle of tables

Details about distribution styles: http://docs.aws.amazon.com/redshift/latest/dg/viewing-distribution-styles.html How to COPY multiple files into Redshift from S3 http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html Could “Group” (or “Order”) by number, not column name

COPY with automatical compression To apply automatic compression to an empty table, regardless of its current compression encodings, run the… Read more »