Example datasets for learning Hive

I find two datasets: employee and salary for learning and practicing. After putting two files into HDFS, we just need to create tables:

Now we could analyze the data.

Find the oldest 10 employees.

Find all the employees joined the corporation in January 1990.

Find the top 10 employees earned the highest average salary. Notice we use ‘order by’ here because ‘sort by’ only produce local order in reducer.

Let’s find out whether this corporation has sex discrimination:

The result is:

Looks good 🙂

