The CPU usage of soft-irq will be counted into a process.

Redis usually runs as a single process daemon that is the perfect sample of UNIX philosophy — one process just do one thing. But current servers have many CPUs(Cores) so we need to launch many Redis processes to provide service. After running multi-process redis-server in server, I found out there always be one redis-server daemon cost more CPU than others in “top” command:

I used perf to collect function samples in different processes and be noticed that some “softirq” function had been called. Then I remember I haven’t balance the soft-irq of netowork card into different CPU cores. After running my script to balance the soft-irq:

The view of “top” command looks much better now:

But I still have a question: why the “top” command count the CPU usage of system soft-irq into a innocent process? The answer is here: The soft-irq is run under process-context, so it certainly need to find a “scapegoat” process to count the CPU usage.

Avoid “page allocation failure” for linux kernel in big memory server

After adding pressure to a Key-Value cluster, I found many error in dmesg:

It’s hard to understand the “page allocation failure” error because the memory capacity is very big in our servers. By looking at the result “free” command, I noticed that a large mount of memory was used to cache files. Maybe the “free” memory is too small so the kernel could not get enough pages when it need many.
But how to reserve more “free” memory in linux kernel? According to this article,we could modify “/proc/sys/vm/min_free_kbytes” to adjust the watermark of linux-memory-management. And the kernel will try hardly to reserve enough “free” memory:


After changing the “/proc/sys/vm/min_free_kbytes” to 1G, the errors became rare but still exists. Then I change it to 4G, and this time, there wasn’t any errors in dmesg now.
At conclude, the default value of “min_free_kbytes” in kernel is too small, we’d better turn up “min_free_kbytes” in machines with big memory.

Too many “ext4-dio-unwrit” processes in system

After adding pressure to application which will write tremendous data into ext4 file system, we see many “ext4-dio-unwrit” kernel threads in “top” screen. Many guys say this is a normal phenomenon, so I check the source code of ext4 in 2.6.32 linux kernel.

The beginning of writing a file in kernel is write-back kernel thread, it will call generic_writepages() and then ext4_write_page():

Let’s look at ext4_add_complete_io():

It will put “io_end” into the work queue “dio_unwritten_wq” which is in the ext4_sb_info. But where does the “dio_unwritten_wq” come from ? In the fs/ext4/super.c:

Oh, it is the “ext4-dio-unwritten” kernel thread! So, the problem is solved: the application dirty the page of file system cache, then the write-back kernel thread will write these dirty pages into specific file system (ext4 in this article), finally ext4 will put io into the work queue “ext4-dio-unwritten” and wait to convert unwritten-extent into written extent of ext4.
Therefore, if we don’t have unwritten-extent in ext4 (just using system-call write() to appending a normal file),the “ext4-dio-unwritten” kernel threads will exist but not using any CPU.

Why you should update your gcc (and c++ library)

Consider the code below:

It could be compiled and run on CentOS 5 (gcc-4.1.2), but will core dump at runtime.

The gdb stack shows the breakpoint is in string_hashfunc::operator():

Let’s see the source code of “ext/hash_map” in /usr/include/c++/4.1.2/ext/hashtable.h:

And in the implementation of _M_bkt_num():

It use _M_hash() to compute the bucket number of the key, and the _M_hash() is actually string_hashfunc::operator(). The reason is clear now: the iterator want to increase, so it call operator++() –> _M_bkt_num() –> _M_bkt_num_key() –> _M_hash() –> string_hashfunc::operator() and it can’t fetch the key because it has been freed in “delete it->first”.

How about new g++ and new c++ library? Let’s try to write the same program on CentOS 7 (gcc-4.8.5) and change “ext/hash_map” to “unordered_map” (for c++ 11 standard):

Then build it:

Everything goes normal because the new implementation of c++ library use “_M_nxt” to point to the next hash node instead of using hash function (could see it in /usr/include/c++/4.8.5/bits/hashtable_policy.h).

Why you should update your gcc

Consider this c++ code:

I compiled it on CentOS-5 on which the version of gcc is 4.1.2 in the first place and it report:

Can anyone find out the problem at first glance of this mess report ? The error report of c++ template is terrible difficult to understand, since I used it 9 years ago.
Then I try to compile the source on CentOS-6 with gcc-4.4.6

Looks almost the same. How about CentOS-7 with gcc-4.8.5

Aha, much better as it tell us the exact position of problem: “vec.begin()” will return a “const_iterator” which is not coherent to “iterator”.
To save your time for debugging c++ template code and enjoy life, please update your gcc.