Monthly Archives: August 2016

Terasort for Spark (part1 / 2)

We could use Spark to sort all the data which is generated by Teragen of Hadoop. TerasortApp.scala

build.sbt

After building the jar file, we could submit it to spark (I run my spark on yarn-cluster mode):

It costs 17 minutes to complete the task, but tool “terasort”… Read more »

Deploy Hive on Spark

      No Comments on Deploy Hive on Spark

The Mapreduce framework is too small for realtime analytic query, so we need to change engine of Hive from “mr” to “spark” (link): 1. set environment for spark:

2. copy configuration xml file for Hive:

and change these configuration items:

Notice: remember to replace all “${system:java.io.tmpdir}/${system:user.name}” in… Read more »