Finding the lost memory

We find out a strange phenomenon in a product server. By using “free” command, it shows there is no free memory in this server. But when we add all processes’s memory allocation:

it show all processes cost only 60GB memory (The whole physical memory of this server is 126GB).

Where is the lost memory?

Firstly we umount the tmpfs but it does not make any change. Then we use:

and soon notice that the “Slab” cost more than 10GB memory. Slab is a linux kernel component for managing memory. If the “Slab” is too high in “/proc/meminfo”, it means kernel may allocate too much resource for user-space program. But, what type of resource? Finally, by using the command:

it shows the all TCP connections’s Recv-Q cost more than 20GB memory. Now we uncover the root cause: too much TCP connections (more than 1 hundred thousand) been created. The solution could be:

  • Reduce the size of TCP Recv-Q
  • Modify user program to reduce the number of TCP connections

Run docker on centos6

Docker use thin-provision of device mapper as its default storage, therefore if we wan’t run docker on centos6, we should update kernel first. I use linux kernel 4.11 and notice these kernel options should be set:

After build and reboot the kernel, I still can’t launch docker service, and finally find out the solution:

puppet 3 certification problem on centos 7

I configure the puppet master and agent followed by this step. But when I run “puppet agent -t”, it report error:

My OS version is “Centos 7” and puppet version is “3.7.5”. After I have tried the way as this page answered, the problem still exists. Therefore, I write down the final correct operations here for future reference:

How to set the value of “$releasever” permanently for yum

In a test server I typed “sudo yum update”, it reported errors like:

Then I found this web in google for introducing how to get the value of “$releasever”, but it does not tell us how to set “$releasever” permanently. Therefore, I have to search word like “releasever” in the source code of ‘yum’:

Finally the source code in ‘/usr/lib/python2.6/site-packages/yum/__init__.py’ comes out:

But where does ‘self.conf.yumvar’ get its values? The answer is ‘/etc/yum/vars/’. After

My ‘sudo yum update’ works correctly now.

A fight to save a old server ….

Coly, who is my old colleague, bought a server computer in 2010. This server, which worth about 30,000 Yuan(RMB) 4 years ago, is composed by a ASUS KGPE-D16 motherboard, two AMD-6172 CPUs, and a 670W power supply. Recently, we need many servers to build some software, so I beg Coly to bring his 4 years old server to our small computer room in our office, and he promised.
After installing Linux on this server, I found out that only one CPU could boot up and run, the other one can’t be recognized by operation system. I asked Coly why, and he answered me maybe one CPU was broken.
I thought a mother-board with two sockets can only run one CPU was a great waste; therefore I bought a new CPU and setup it on the mother-board. But after that, the whole server couldn’t boot up totally. Having tried many ways but all failed, I turn to ask Coly to rescue his own old server.
At last, Coly come here and open the lid of the server case, and becomes astonishing very soon.

Coly: What on earth have you done to my server! Have you killed a man who sales silicone grease? You paste too much silicone grease on the CPU, even the edge of the sockets also be contaminated!
Me: Sorry, I just want to make sure the CPU will not be too hot.
Coly: Yeah, they will never be hot now —— because they can’t even run.

Then, Coly kneels on the floor and begin to clean all the silicon grease on the mother-board, which is a really hard work.

    

But, after clearing, the server still could not boot up.

Coly: Maybe the mother-board has broken, or maybe power supply got a shot-circuit. How much is a KGPE-D16 motherboard in TAOBAO (a very famous e-commerce website in China)? It cost me 8,000 Yuan 4 years ago.
Me: About 800 Yuan.
Coly: ….Mother board becomes cheap so fast. Then, how much is a 670W power supply in TAOBAO? It was 800 Yuan before.
Me: En, about 600 Yuan.
Coly: Ah-ha, looks power supply is a hedging tool for preserving monetary value. I reckon I will buy a large mount of power supply instead of stocks or gold to preserve my money .

In my opinion, a power supply is more stable than a mother board, so I bought a new mother board and luckily boot up the server with two CPU (24 cores). At last, no servers or machines are wasted, in my hand 🙂

Problems about using zookeeper

Problem 1:

The zookeeper cluster is running well for half year a year. But today, after I re-configurate it and run command

It failed to startup and report

The point is the last term “Invalid config”(log4j is just warning); therefore I reviewed zoo.cfg many times but finding no mistake utterly.
After checking all configurations, I eventually find out the problem: the file “myid” missed. After adding the “myid” file, zookeeper startup correctly.

It seems the error log of zookeeper is misleading——it says the config file is invalid but the true reason is missing of a config file.

Problem 2:

For tolerating failure of four servers at most, we assumed that a five-servers zookeeper cluster will be enough. After learning of Paxos for a while, a problem occurs on me: the majority of five-servers-cluster is three-servers, how could zookeeper works to elect a new leader if more than two servers are down? So I do the test and find out that the zookeeper do fail to work if more than two servers are shutdown.
The correct number of zookeeper cluster which could tolerate failure of four servers is nine; because after four servers shutdown, the five survivors is also the majority of nine-server-cluster.