Usage limitations of HDFS’s C API

I have to change a program which is written by c language from writing local files to writing on HDFS. After learning the example of C API in libhdfs, I complete the modification of open()/write()/read() to hdfsOpenFile()/hdfsWriteFile()/hdfsReadFile() and so on. But when running the new program, many problems occured. The first is: after fork(), I can’t open files of HDFS anymore. And the problem looks very common in community and haven’t any solution yet.
So I have to try the hdfs-fuse tool. According to the steps of this article, I successfully build and run the hdfs-fuse:

But something weird happened:

After fsync(), the size of file “my.db” is still zero by “ls” command on mountpoint “/data”! It cause the program report error and can’t continue to process.
The reason is fuse-dfs haven’t implement fuse_fsync() interface. After adding the implementation of fuse_fsync() by hdfsHSync(), it works now. But the performance is too bad: about 10~20MB/s in network.
Consequently, I decided to use glusterfs instead of HDFS because it totally don’t need any modification for user program and support erasure-code since version 3.6 (this will dramatically reduce occupation of storage space).

Vault Conference 2015, in Boston

Vault Conference 2015


First talk, and the sponsors

John Spray and his talk about development updates of Ceph

I and Coly also give a presentation about our Cold Storage project, the slide is here: Lambert: Achieve High Durability, Low Cost & Flexibility at Same Time — open source cold storage engine for ExaBytes data in Alibaba

I am appreciating the help from open source community, seems my oral English is not bad 🙂