After adding pressure to a Key-Value cluster, I found many error in dmesg:
[551336.912108] [] ? dequeue_task+0x8e/0xb0 [551336.912114] [ ] ? ext4_get_block+0x0/0x120 [ext4] [551336.912118] [ ] ? __do_fault+0xd0/0x530 [551336.912122] [ ] ? copy_user_generic+0xe/0x20 [551336.912124] [ ] ? handle_pte_fault+0x9c/0xba0 [551336.912131] [ ] ? rwsem_down_failed_common+0x95/0x1e0 [551336.912134] [ ] ? rwsem_down_read_failed+0x26/0x30 [551336.912137] [ ] ? handle_mm_fault+0x23a/0x310 [551336.912142] [ ] ? call_rwsem_down_read_failed+0x14/0x30 [551336.912145] [ ] ? __do_page_fault+0x139/0x480 [551336.912149] [ ] ? finish_task_switch+0x4f/0xe0 [551336.912152] [ ] ? do_page_fault+0x3e/0xb0 [551336.912156] [ ] ? page_fault+0x25/0x30 [552116.858565] swapper: page allocation failure. order:1, mode:0x20 [552116.858569] Pid: 0, comm: swapper Tainted: G --------------- H #1 [552116.858571] Call Trace: [552116.858573] [ ] ? __alloc_pages_nodemask+0x76a/0x8f0 [552116.858588] [ ] ? dev_hard_start_xmit+0x303/0x570 [552116.858593] [ ] ? kmem_getpages+0x62/0x170 [552116.858596] [ ] ? fallback_alloc+0x1be/0x270 [552116.858599] [ ] ? cache_grow+0x2d1/0x320 [552116.858602] [ ] ? ____cache_alloc_node+0x99/0x160 [552116.858605] [ ] ? kmem_cache_alloc+0x11b/0x1b0 [552116.858610] [ ] ? sk_prot_alloc+0x48/0x1d0 [552116.858615] [ ] ? sk_clone+0x22/0x2c0 [552116.858619] [ ] ? inet_csk_clone+0x16/0xd0 [552116.858624] [ ] ? tcp_create_openreq_child+0x60/0x490 [552116.858627] [ ] ? tcp_v4_syn_recv_sock+0x6a/0x310 [552116.858630] [ ] ? tcp_check_req+0x249/0x4d0 [552116.858633] [ ] ? tcp_v4_do_rcv+0x398/0x470 [552116.858636] [ ] ? tcp_v4_rcv+0x52a/0x8d0 [552116.858644] [ ] ? bond_start_xmit+0xbb/0x5d0 [bonding] [552116.858648] [ ] ? ip_local_deliver_finish+0xdd/0x2d0 [552116.858651] [ ] ? ip_local_deliver+0x98/0xa0 [552116.858653] [ ] ? ip_rcv_finish+0x12d/0x440 [552116.858656] [ ] ? ip_rcv+0x285/0x370 [552116.858659] [ ] ? __netif_receive_skb+0x4bb/0x780 [552116.858662] [ ] ? tcp4_gro_receive+0x5a/0xd0 ......
It’s hard to understand the “page allocation failure” error because the memory capacity is very big in our servers. By looking at the result “free” command, I noticed that a large mount of memory was used to cache files. Maybe the “free” memory is too small so the kernel could not get enough pages when it need many.
But how to reserve more “free” memory in linux kernel? According to this article,we could modify “/proc/sys/vm/min_free_kbytes” to adjust the watermark of linux-memory-management. And the kernel will try hardly to reserve enough “free” memory:
After changing the “/proc/sys/vm/min_free_kbytes” to 1G, the errors became rare but still exists. Then I change it to 4G, and this time, there wasn’t any errors in dmesg now.
At conclude, the default value of “min_free_kbytes” in kernel is too small, we’d better turn up “min_free_kbytes” in machines with big memory.