Usage limitations of HDFS’s C API

I have to change a program which is written by c language from writing local files to writing on HDFS. After learning the example of C API in libhdfs, I complete the modification of open()/write()/read() to hdfsOpenFile()/hdfsWriteFile()/hdfsReadFile() and so on. But when running the new program, many problems occured. The first is: after fork(), I can’t open files of HDFS anymore. And the problem looks very common in community and haven’t any solution yet.
So I have to try the hdfs-fuse tool. According to the steps of this article, I successfully build and run the hdfs-fuse:

But something weird happened:

After fsync(), the size of file “my.db” is still zero by “ls” command on mountpoint “/data”! It cause the program report error and can’t continue to process.
The reason is fuse-dfs haven’t implement fuse_fsync() interface. After adding the implementation of fuse_fsync() by hdfsHSync(), it works now. But the performance is too bad: about 10~20MB/s in network.
Consequently, I decided to use glusterfs instead of HDFS because it totally don’t need any modification for user program and support erasure-code since version 3.6 (this will dramatically reduce occupation of storage space).

Leave a Reply

Your email address will not be published. Required fields are marked *

*
*

This site uses Akismet to reduce spam. Learn how your comment data is processed.