Some thoughts about cuDF and cuML

I just received an email from NVIDIA about their RAPIDS. Although the cuDF and cuML look fantastic for a data scientist. I am still doubtful about them.

In our daily work, we usually process small DataFrame by Pandas, so cuDF will be too expensive since it needs GPU. And even we need to join two large DataFrame, we tend to use BigQuery, for it’s distributed and relatively cheap. The only proper case for cuDF I think is some heavy operations on less than 8GB data. Who need so many heavy operations on a DataFrame? I don’t know.

For cuML, it’s more like a GPU version scikit-learn. Actually, for tabular data we use XGBoost/LightGBM, for non-structure-data we use PyTorch/Tensorflow. Who will even use scikit-learn? Not even mention the cuML.

Robin on Linux

Some thoughts about cuDF and cuML

Leave a Reply Cancel reply

Robin on Linux

Related Posts

Leave a Reply Cancel reply