DCTC 2016 conference

Yesterday I went to attend DCTC(Data Center Technology Conference) 2016 in Beijing. Although it is called “Data Center Technology”, most topics is about storage, because the conference is hold by Memblaze, a famous flash-storage startup company in China.
Xuebing Yin, The CEO of Memblaze, gave the first topic:




2016 is a important year for flash-storage because the revenue of SSD become bigger than Hard-disk for the first time. As we can see, data center will become full silicon (Hard-Disk is the only non-silicon component in servers) in the near future. As the speed of SSD and its interface (from SATA to PCIE) become faster and faster, many old softwares become bottleneck of performance: mysql 5.6 can’t use up the performance of high-speed SSD, but mysql 5.7 could.

Janene Ellefson from NVME organization introduced why we need a standard for high-speed data transfer.




Could use up to 64K queues with 64K command for each queue, NMVE is definitely the most powerful protocol for modern (or future) IO devices.

Xin Wu from GBase introduced the problems they face in using SSD for database




GBase is a serial of database products for OLTP/OLAP and global data storage. The SSD benefited the OLTP application, but for OLAP application the SSD is too expensive because the Hard-disk Array could also provide the same bandwidth. Maybe that’s why AWS released new type of EBS months ago.

Coly Li (Yes, my old friend ^_^) from SUSE Labs showed us the improvements of Linux Soft RAID in recent years. Many years ago, Linux soft RAID was used only for low speed Hard Disks, so the cost of bad software implement is not significant. But recently, the widely use of SSD expose many bottlenecks in soft RAID, and the developers of open source community commit many patches to improve performance. And, many patches came from Shaohua Li, a sophisticate kernel developer.(He worked first for Intel, and then Fusionio, and now Facebook).



In the tea break, I visited the exhibition of Memblaze.




This is a 1U server built by SuperMicro, it use eight NVME SSD by SFF8639 interfaces in the front. Running Mysql at full speed (4600% CPU usage), the wind run out from back is still not hot. Looks the server of SuperMicro is very effective, and cool 🙂

China Linux Storage & Filesystem 2014 workshop (first day)

CLSF (China Linux Storage & File System Workshop) is an effort to make local Linux kernel hackers get together to share and exchange ideas. CLSF is an invitation only workshop, in order to have effective communication, only a small group of people will be invited. Most of the invitees are active upstream Linux kernel developers locally from China, focus on I/O and storage sub-systems.

CLSF 2014 was hold in office of XiaoMi which is a famous consumer electronics company in china. Participators are mainly from Huawei, Fujitsu, Intel, Alibaba and other companies.

The first topic lead by Jiufei Xue from Huawei is about ocfs2. Huawei was building their private cloud product on ocfs2, so in recent two years the kernel developers in Huawei commited many fix patches and new features into ocfs2 community. In this year, they add range lock into ocfs2, so users could not only lock the whole file but a specific range of one file, which will promote the performance in cluster when many clients read/write files at the same time.

ocfs2

F2FS is a new filesystem in kernel. It is based on devices like SD-card. It stores meta-data in the beginning of device (The random read/write performance in the beginning of SD-card is very well) and use indirect-block layout just like ext2. The reason of not using extent (like ext4) is for its convenience of garbage collection of NAND block. F2FS also merges many ‘sync’ operations into one, mainly for the speed (The ‘sync’ operation on SD-card in mobile-phone is very slow).

Zeifan Li (from Huawei): If one day, the SD-card adds FTL layer into its controller and let SD-card works just like enterprise SSD. Will the advantage of F2FS be disappear?
Ying Huang (from Intel): Let’s look at this another way. If F2FS do the whole work of SSD firmware, we could use kernel file system on NAND flash directly, which will save a lot of money 🙂

f2f2

Bo Liu from Oracle hold the topic about btrfs. The new features for btrfs decreased in this year, and the main work is fixing bugs. Using the standard kernel worker thread in btrfs cause a serious bug which would lost user’s data. Bo Liu spends a long time to repeat the bug, and fortunately, fix it at last.

btrfs

Coly (from Alibaba): I have joined the linux plumber conference in this year. On a presentation about docker, the speaker complained that btrfs is the most unstable filesystem in linux kernel. The developers in btrfs community may have to spend more time on fixing bugs.

The last topic in first day is hold by an engineer from Memblaze (a china-version fusion-io company). They face many problmes in building All-Flash-Array (called AFA). In AFA, the linux system in box has become bottleneck in the whole IO path: too much interrupts cost too much CPU and time; the implemention of socket and TCP/IP stack in linux is too insufficient; context switch for processes also make IOPS unstable; even filesystem itself spend too much time in searching files.

memblaze

Besides these problems, they also speak out a new viewpoint: SSD will become a perfect device to store cold data as NVME and PCM will become heap very quickly. For this point, we began a long and heated discussion. If PCM comes into the market, the Block-layer and filesystem in linux kernel will become obsolete, and almost all the guys sit in the meeting-room will be out of work :). So, maybe the time has come closely for the big chang in storage area.

clsf 2014