Monthly Archives: November 2016

Using “sysbench” to test memory performance

Sysbench is a powerful testing tool for CPU / Memory / Mysql etc. Three years ago, I used to test performance of MYSQL by using it.
Yesterday, I used Sysbench to test memory bandwidth of my server.
By using command:

sysbench --test=memory --memory-block-size=1M --memory-total-size=100G --num-threads=1 run

It reported the memory bandwidth could reach 8.4GB/s, which did make sense for me.
But after decrease the block size (Change 1M to 1K):

sysbench --test=memory --memory-block-size=1K --memory-total-size=100G --num-threads=1 run

The memory bandwidth reported by Sysbench became only 2GB/s
This regression of memory performance really confuse me. Maybe the memory of modern machines has some kind of “Max limited frequency” so we can’t access memory with too high frequency?
After checked the code of Sysbench, I found out its logic about memory test is just like this program (I wrote it myself):

/* mytest.c */
#include 
#include 
#include 
const long DATA = (100 * 1024 * 1048576LL); /* 100G data */
int main(int argc, char *argv[]) {
    volatile int tmp = 0;
    int *buffer, *end, *begin;
    long i, loop, block_size;
    struct timeval before, after;
    if (argc < 2) {
        return -1;
    }
    block_size = atoi(argv[1]);
    buffer = (int *)malloc(block_size);
    end = (int*)(((char *)buffer) + block_size);
    loop = (long)DATA / block_size;
    gettimeofday(&before, NULL);
    for (i = 0; i < loop; i++) {
        for (begin = buffer; begin < end; begin++) {
            *begin = tmp;
        }
    }
    gettimeofday(&after, NULL);
    printf("time: %lu\n", (after.tv_sec * 1000000 + after.tv_usec)
        - (before.tv_sec * 1000000 + before.tv_usec));
    free(buffer);
}

But this test program cost only 14 seconds (Sysbench cost 49 seconds). To find out the root cause, we need to use a more powerful tool -- perf:

# perf stat -e cache-misses,faults,branch-misses ./mytest 1048576
Performance counter stats for './my 1048576':
            90,395 cache-misses
               400 faults
           178,554 branch-misses
      14.825497139 seconds time elapsed
# perf stat -e cache-misses,faults,branch-misses sysbench --test=memory --memory-block-size=1K --memory-total-size=100G --num-threads=1 run
Performance counter stats for 'sysbench --test=memory --memory-block-size=1K --memory-total-size=100G --num-threads=1 run':
           739,223 cache-misses
               825 faults
           531,908 branch-misses
      49.264963322 seconds time elapsed

They have totally different CPU cache-misses. The root cause is because Sysbench use a complicate framework to support different test targets (Mysql/Memory ...), which need to pass a structure named "request" and many other arguments in and out of execution_request() function many times in one request (accessing 1K memory, in our scenario), this overload becomes big when block size is too small.
The conclusion is: don't use Sysbench to test memory performance by using too small block size, better bigger than 1MB.
Ref: by Coly Li 's teaching, memory do have "top limit access frequency" (link). Take DDR4-1866 for example: it's data rate is 1866MT/s （MT = Mega Transfer) and every transfer takes 8 bytes, so we can access memory more than 1 billion times per second, theoretically.

Install CDH(Cloudera Distribution Hadoop) by Cloudera Manager

These days I was trying to install Cloudera-5.8.3 on my centos-7 machines, and here are some steps for operation and tips for trouble shooting:
0. If you are not in USA, the speed of network for accessing Cloudera Repository of RPMS(or Parcels) is desperately slow, thus we need to move CM (Cloudera Manager) Repo and CDH Repo to local.
Create local CM Repo
Create local CDH Repo
1. Install Cloudera Manager (steps)
2. Start Cloudera Manager

sudo cmf-server start

But it report:

org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:142)
... 22 more
Caused by: org.hibernate.service.classloading.spi.ClassLoadingException: HHH010003: JDBC Driver class not found: com.mysql.jdbc.Driver
at org.hibernate.service.jdbc.connections.internal.C3P0ConnectionProvider.configure(C3P0ConnectionProvider.java:142)
at org.hibernate.service.internal.StandardServiceRegistryImpl.configureService(StandardServiceRegistryImpl.java:75)

In centos-7, the solution is:

# Install Mysql Driver for Java
sudo yum install mysql-connector-java -y
# Set jar to CLASSPATH
export CMF_JDBC_DRIVER_JAR=/usr/share/java/mysql-connector-java.jar
# Start Cloudera Manager again
sudo cmf-server start

Also need to run “sudo ./cloudera-manager-installer.bin –skip_repo_package=1” to create “db.properties”.
3. Login to the Cloudera Manager(port: 7180) and follow the steps of Wizard to create a new cluster. (Choose the local repository for installation will bring favorable fast speed 🙂
Make sure the hostname of every node is correct. And by using “Host Inspector”, we can reveal many potential problems in these machines.
After tried many times to setup cluster, I found this error in logs of some nodes:

Error, CM server guid updated, expected 85587073-270d-43d9-a44a-e213d9f7e45b, received 4c1402a5-8364-4598-a382-0c760710e897

The solution is simple:

#For the error node
sudo rm -rf /var/lib/cloudera-scm-agent/cm_guid

and restart Cloudera Manager Agent on these nodes.
I also confronted a problem that installation progress has hanged on this message:

Acquiring installation lock...

There isn’t any process of “yum” running in the node, so why it still acquire installation lock? The answer is:

sudo rm -rf /tmp/.scm_prepare_node.lock

4. After many fails and retry, I eventually setup the Hadoop Ecosystem of CDH:

When upgrading or downgrading a Cloudera Cluster, your may see this problem:

The solution is (if in ‘single user mode’):

sudo chown cloudera-scm:cloudera-scm /run/cloudera-scm-agent/ -R
sudo chown cloudera-scm:cloudera-scm /var/lib/cloudera-scm-agent/ -R

and try it again.
When staring ResourceManager, it failed and report:

2017-06-05 16:31:58,812 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Update thread interrupted. Exiting.
2017-06-05 16:31:58,813 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Continuous scheduling thread interrupted. Exiting.
java.lang.InterruptedException: sleep interrupted
    at java.lang.Thread.sleep(Native Method)
    at org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler$ContinuousSchedulingThread.run(FairScheduler.java:319)
2017-06-05 16:31:58,814 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Interrupted while waiting to reload alloc configuration
2017-06-05 16:31:58,814 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2017-06-05 16:31:58,814 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2017-06-05 16:31:58,814 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
2017-06-05 16:31:58,814 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: AMLivelinessMonitor thread interrupted
2017-06-05 16:31:58,815 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2017-06-05 16:31:58,816 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.webapp.WebAppException: Error starting http server
    at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:278)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startWepApp(ResourceManager.java:990)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1090)
    at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
    at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1222)
Caused by: java.io.IOException: Problem in starting http server. Server handlers failed
    at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:912)
    at org.apache.hadoop.yarn.webapp.WebApps$Builder.start(WebApps.java:273)
    ... 4 more
2017-06-05 16:31:58,818 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:

The reason of this error is: there is a Non-Cloudera version of zookeeper installed on the host. Remove it and reinstall zookeeper from CDH, the yarn-resource-manager will be launched successfully.
If meet “Deploy Client Configuration failed” when create new service, just add sudo nopassword to cloudera-scm user.

cloudera-scm    ALL=(ALL)       NOPASSWD: ALL

Using Pig to join two tables and sort it

Having two tables: salary and employee，we can use Pig to find the most high-salary employees:

salary = LOAD '/user/robin/salaries/salaries.csv' USING PigStorage(',') AS (uid:int, salary:int, begin:chararray, end:chararray);
employee = LOAD '/user/robin/employees/employees.csv' USING PigStorage(',') AS (uid:int, birth:chararray, givenname:chararray, familyname:chararray, gender:chararray, work:chararray);
jo = JOIN employee BY uid, salary BY uid;
res = ORDER (
             FOREACH (
                      GROUP jo BY (employee::uid, employee::birth, employee::givenname, employee::familyname, employee::gender, employee::work)
             )
             GENERATE group.employee::uid, group.employee::givenname, group.employee::familyname, AVG(jo.salary::salary) AS avg_salary
      ) BY avg_salary DESC;
fs -rmr /user/sanbai/join_result;
STORE res INTO '/user/robin/join_result' USING PigStorage(',');

The result is:

109334,'Tsutomu','Alameldin',141835.33333333334
205000,'Charmane','Griswold',141064.63636363635
43624,'Tokuyasu','Pesch',138492.94444444444
493158,'Lidong','Meriste',138312.875
37558,'Juichirou','Thambidurai',138215.85714285713
276633,'Shin','Birdsall',136711.73333333334
238117,'Mitsuyuki','Stanfel',136026.2
46439,'Ibibia','Junet',135747.73333333334
254466,'Honesty','Mukaidono',135541.0625
253939,'Sanjai','Luders',135042.25
....

Robin on Linux

Monthly Archives: November 2016