Example datasets for learning Hive

I find two datasets: employee and salary for learning and practicing. After putting two files into HDFS, we just need to create tables:

Now we could analyze the data.

Find the oldest 10 employees.

Find all the employees joined the corporation in January 1990.

Find the top 10 employees earned the highest average salary. Notice we use ‘order by’ here because ‘sort by’ only produce local order in reducer.

Let’s find out whether this corporation has sex discrimination:

The result is:

Looks good 🙂

4 thoughts on “Example datasets for learning Hive

  1. Pingback: Partitioning and Bucketing Hive table – Robin On Linux

  2. Pingback: Using Pig to join two tables and sort it – Robin On Linux

  3. Pingback: Example datasets for Amazon RedShift – Robin On Linux

  4. Pingback: Data Join in AWS Redshift – Robin On Linux

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.