Monthly Archives: September 2016

How to show the memory details of a process in Linux

Recently we are evaluating the memory consumption of redis process when it is rewriting of append-only-file. In this situation, redis will firstly fork(), and then the child process will write all the data from memory to new AOF file and the parent process will still provide service to custom. The new written key/value will make the page dirty and hence consume a lot of new memory (a small key/value pair may cause a whole 4K page be allocated).
We write a simple shell to monitor the detail memory consumption of this process (which is referred from http://superuser.com/questions/102005/how-can-i-display-the-memory-usage-of-each-process-if-i-do-a-ps-ef)

#!/bin/bash
pid=$1
if [ -f /proc/$pid/smaps ]; then
    echo "* Mem usage for PID $pid"
    rss=$(awk 'BEGIN {i=0} /^Rss/ {i = i + $2} END {print i}' /proc/$pid/smaps)
    pss=$(awk 'BEGIN {i=0} /^Pss/ {i = i + $2 + 0.5} END {print i}' /proc/$pid/smaps)
    sc=$(awk 'BEGIN {i=0} /^Shared_Clean/ {i = i + $2} END {print i}' /proc/$pid/smaps)
    sd=$(awk 'BEGIN {i=0} /^Shared_Dirty/ {i = i + $2} END {print i}' /proc/$pid/smaps)
    pc=$(awk 'BEGIN {i=0} /^Private_Clean/ {i = i + $2} END {print i}' /proc/$pid/smaps)
    pd=$(awk 'BEGIN {i=0} /^Private_Dirty/ {i = i + $2} END {print i}' /proc/$pid/smaps)
    echo "-- Rss: $rss kB"
    echo "-- Pss: $pss kB"
    echo "Shared Clean $sc kB"
    echo "Shared Dirty $sd kB"
    echo "Private $(($pd + $pc)) kB"
fi

The result looks like this:

* Mem usage for PID 13579
-- Rss: 63241 kB
-- Pss: 21085 kB
Shared Clean 63233 kB
Shared Dirty 0 kB
Private 7 kB

Divide Shared Memory by PSS (Proportional Set Size) is about 3, which means there are 3 processes using this 63MB share-memory.

“database is locked” in Hue

After launching a long-time HiveQL in SQL Editors of Hue, a small exceptional tip appears under the editor “database is locked”. The solution is to make Hue use Mysql instead of sqlite3. But I am using Hue directly got from github, not Cloudera Release version. So the correct steps should be:

Stop Hue server
Install Mysql and create database ‘hue’

Edit desktop/conf/pseudo-distributed.ini, add these in “[[database]]” section:

    engine=mysql
    host=127.0.0.1
    port=3306
    user=
    password=
    name=hue

Run “make apps” (This is the most important step, as it will install Mysql connector/packages automatically and create meta tables in ‘hue’ database)

Start Hue server

build/env/bin/hue runserver 0.0.0.0:8000

Now we can run long-time query and there will be no error.

“java.io.Exception: failed to uncompress the chunk” in Apache Spark

After I run spark-submit in my YARN cluster with Spark-1.6.2:

./bin/spark-submit --class TerasortApp \
  --master yarn \
  --deploy-mode cluster \
  --driver-memory 4G \
  --executor-memory 12G \
  --executor-cores 4 \
  --num-executors 16 \
  --conf spark.yarn.executor.memoryOverhead=4000 \
  --conf spark.memory.useLegacyMode=true \
  --conf spark.shuffle.memoryFraction=0.6 \
  --conf "spark.executor.extraJavaOptions=-XX:+UseG1GC -XX:ArrayAllocationWarningSize=2048M" \
  --queue spark \
  /home/sanbai/myspark/target/scala-2.10/test_2.10-1.0.jar

The job fail, and the log report:

com.esotericsoftware.kryo.KryoException: java.io.IOException: failed to uncompress the chunk: PARSING_ERROR(2)
Serialization trace:
bytes (org.apache.hadoop.io.Text)
  at com.esotericsoftware.kryo.io.Input.fill(Input.java:142)
  at com.esotericsoftware.kryo.io.Input.require(Input.java:169)
  at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:317)
  at com.esotericsoftware.kryo.io.Input.readBytes(Input.java:297)
  at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:35)
  at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.read(DefaultArraySerializers.java:18)
  at com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:699)
  at com.esotericsoftware.kryo.serializers.FieldSerializer$ObjectField.read(FieldSerializer.java:611)
  at com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:221)
  at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:729)
  at org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:228)
  at org.apache.spark.serializer.DeserializationStream.readValue(Serializer.scala:171)
  at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:201)
  at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:198)
  at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
  at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
  at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)

Somebody in the internet say may be this is caused by the compatibility problem between Spark-1.6.2 and Snappy. Therefore I add

--conf spark.io.compression.codec=lz4

to my spark-submit shell script to change compress algorithm from Snappy to lz4. And this time everything goes ok.

Finding the lost memory

We find out a strange phenomenon in a product server. By using “free” command, it shows there is no free memory in this server. But when we add all processes’s memory allocation:

ps aux|awk '{s+=$6} END{print s}'

it show all processes cost only 60GB memory (The whole physical memory of this server is 126GB).
Where is the lost memory?
Firstly we umount the tmpfs but it does not make any change. Then we use:

cat /proc/meminfo

and soon notice that the “Slab” cost more than 10GB memory. Slab is a linux kernel component for managing memory. If the “Slab” is too high in “/proc/meminfo”, it means kernel may allocate too much resource for user-space program. But, what type of resource? Finally, by using the command:

ss | awk '{if ($2>0) print $2;}' | awk '{sum+=$1} END {print "Sum = ", sum}'

it shows the all TCP connections’s Recv-Q cost more than 20GB memory. Now we uncover the root cause: too much TCP connections (more than 1 hundred thousand) been created. The solution could be:

Reduce the size of TCP Recv-Q

sudo bash -c 'echo “4096 4096 4096” > /proc/sys/net/ipv4/tcp_rmem'

Modify user program to reduce the number of TCP connections

Robin on Linux

Monthly Archives: September 2016