Using XGBoost to predict large sparse data

For using XGBoost to predict, I wrote code like this:

But it reported error:

Looks csr_matrix in SciPy is not supported by XGBoost. Maybe I need to transfer sparse data to dense:

But it still reported:

The 'test' data is too big so it cann't even

Some summaries for Kaggle’s competition ‘Humpback Whale Identification’

This time, I only spent one month on competition "Humpback Whale Identification". But still, get a little step forward than previous competitions. Here are my summaries: 1. Do review 'kernels' in competition page, this will teach me a lot of information and new technology. By using Siamese Network rather than

Using ResNeXt in Keras 2.2.4

To use ResNeXt50, I wrote my code as the API documentation for Keras:

But it reported errors:

That's weird. The code doesn't work as documentation said. So I checked the code of Keras-2.2.4 (the version in my computer), and noticed that this version of code use 'keras_applications' instead

Some tips about using Keras

1. How to use part of a model

The 'img_embed' model is part of 'branch_model'. We should realise that 'Model()' is a heavy cpu-cost function so it need to be create only once and then could be used many times. 2. How to save a model when using 'multi_gpu_model'

LinearSVC versus SVC in scikit-learn

In competition ‘Quora Insincere Questions Classification’, I want to use simple TF-IDF statistics as a baseline.

The result is not bad:

The result is not bad:

But after I change LinearSVC to SVC(kernel='linear'), the program couldn't work out any result even after 12 hours! Am I doing anything wrong? In the page of

Some errors in dataset pipeline of Tensorflow

To extend image datasets by using mixup,I use this snippet to mix two images:

But after generating images by using this snippet, the training report errors:

The size of each image is 512x512x4 = 1048576 bytes. But I can't understand why there is image has the size of

Compare implementation of tf.AdamOptimizer to its paper

When I reviewed the implementation of Adam optimizer in tensorflow yesterday, I noticed that it's code is different from the formulas that I saw in Adam's paper. In tensorflow's formulas for Adam are: But the algorithm in the paper is: Then quickly I found these words in the document of

The bug about using hooks and MirroredStrategy in tf.estimator.Estimator

When I was using MirroedStrategy in my tf.estimator.Estimator:

and add hooks for training:

The tensorflow report errors:

Without finding any answers on google, I have to look into the code of ‘’ in tensorflow. Fortunately, the code defect is obvious:

class Estimator havn't any private argument

Some lessons from Kaggle’s competition

About two months ago, I joined the competition of 'RSNA Pneumonia Detection' in Kaggle. It's ended yesterday, but I still have many experiences and lessons to be rethinking. 1. Augmentation is extremely crucial. After using tf.image.sample_distorted_bounding_box() in my program, the mAP(mean Average Precision) of evaluating dataset thrived to a perfect