Robin on Linux – All about technology

Use repeated dataset correctly with timm’s data loader

For an experiment of metaformer, I was trying to add CIFAR100 dataset into the training script. Since CIFAR100 is too small, I need to let it repeat mulitple times in one epoch. Therefore I add a new type of dataset:

class RepeatDataset(Dataset):
    def __init__(self, dataset, repeats):
        self.dataset = dataset
        self.repeats = repeats
        self.length = len(dataset) * repeats

    def __getitem__(self, idx):
        return self.dataset[idx % len(self.dataset)]

    def __len__(self): 
        return self.length

But the training will report error:

Traceback (most recent call last):                                                                                    
  File "/home/robin/code/metaformer/train.py", line 970, in <module>                                                  
    main()                                                                                                            
  File "/home/robin/code/metaformer/train.py", line 732, in main                                                      
    train_metrics = train_one_epoch(                       
                    ^^^^^^^^^^^^^^^^                                                                                  
  File "/home/robin/code/metaformer/train.py", line 798, in train_one_epoch                                           
    for batch_idx, (input, target) in enumerate(loader):                                                              
                                      ^^^^^^^^^^^^^^^^^                                                               
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/timm/data/loader.py", line 131, in __iter__                                                                                                                     
    for next_input, next_target in self.loader:                                                                       
                                   ^^^^^^^^^^^                                                                        
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 733, in __next__                                                                                                          
    data = self._next_data()                                                                                                                                                                                                                
           ^^^^^^^^^^^^^^^^^                                                                                          
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1515, in _next_data                                                                                                       
    return self._process_data(data, worker_id)                                                                        
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                        
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/torch/utils/data/dataloader.py", line 1550, in _process_data                                                                                                    
    data.reraise()                                         
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/torch/_utils.py", line 750, in reraise                                                                                                                          
    raise exception                                        
AttributeError: Caught AttributeError in DataLoader worker process 0.                                                 
Original Traceback (most recent call last):                                                                           
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/torch/utils/data/_utils/worker.py", line 349, in _worker_loop                                                                                                   
    data = fetcher.fetch(index)  # type: ignore[possibly-undefined]                                                   
           ^^^^^^^^^^^^^^^^^^^^                            
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch                                                                                                            
    return self.collate_fn(data)                           
           ^^^^^^^^^^^^^^^^^^^^^                           
  File "/home/robin/miniconda3/envs/poolformer/lib/python3.12/site-packages/timm/data/mixup.py", line 305, in __call__                                                                                                                      
    output = torch.zeros((batch_size, *batch[0][0].shape), dtype=torch.uint8)                                         
                                       ^^^^^^^^^^^^^^^^^                                                              
AttributeError: 'Image' object has no attribute 'shape'. Did you mean: 'save'?

It cost me a quite long time to solve it. The key is in the implementation of “timm.data.create_loader”: https://github.com/huggingface/pytorch-image-models/blob/main/timm/data/loader.py#L291. In it, it will set “dataset.transform” to a new value, and in “timm.data.dataset” https://github.com/huggingface/pytorch-image-models/blob/main/timm/data/dataset.py#L66-L67, it will check and use this new set “transform”:

...
        if self.transform is not None:
            img = self.transform(img)     
...

Since the class RepeatDataset is created by myself and it will not handle the “dataset.transform = create_transform()”, it failed when calling the non-existed “transform()”.

The fix comes from ChatGPT and I think it’s not bad:

class RepeatDataset(Dataset):
    def __init__(self, dataset, repeats):
        self.dataset = dataset
        self.repeats = repeats
        self.length = len(dataset) * repeats

    @property
    def transform(self):
        return self.dataset.transform

    @transform.setter
    def transform(self, value):
        self.dataset.transform = value

    def __getitem__(self, idx):
        return self.dataset[idx % len(self.dataset)]

    def __len__(self):
        return self.length

Different default behaviour of JPEG encoding for dart and python

In dart, create JPEG file is like:

import 'dart:io';
import 'package:image/image.dart';

...

List<int> resizedBytes = encodeJpg(resizedImage);
File(outputPath).writeAsBytesSync(resizedBytes)

In python, we usually use:

import cv2

...

cv2.imwrite("output.jpg", output_image)

But they are different! As the source code show, the “encodeJpg()” use 100% quality and YUV444 chroma as default, but cv2 use 95% quality and YUV420 chroma as default.

If you want to write a JPEG file just as “encodeJpg()” do by using python, the code snippet should be:

import cv2

...

params = [cv2.IMWRITE_JPEG_QUALITY, 100, cv2.IMWRITE_JPEG_SAMPLING_FACTOR, cv2.IMWRITE_JPEG_SAMPLING_FACTOR_444]
success, encoded_image = cv2.imencode(".jpg", output_image, params)
if success:
  with open(output_path, "wb") as fp:
    fp.write(encoded_image)

Experiments about ‘accelerate’ library of HuggingFace

If you want to run your training code with ‘accelerate‘ fp8, you need to install ‘transformer_engine‘ or ‘MS-AMP‘. But these two packages are hard to install beccause they depends on specific CUDA/CUDNN versions. After one afternoon’s efforet, I finally gave up and started to directly using docker image ‘nvcr.io/nvidia/pytorch:24.04-py3’.

docker run \
  --gpus all \
  -it \
  --rm \
  --shm-size="16g" \
  --network host \
  nvcr.io/nvidia/pytorch:24.04-py3

After enter the container by using above command, I still need to install ‘accelerate’ directly by using ‘python3 -m pip install accelerate’. In the ‘accelerate config’, I set to use ‘fp8’ with ‘E4M3’. But the training process reported error about LayerNorm. Then I manually modify the code (may not be correct but it works):

# transformer_engine/pytorch/module/layernorm.py

class _LayerNorm(torch.autograd.Function):
    """functional LayerNorm"""

    @staticmethod
    def forward(
        ctx,
        inp: torch.Tensor,
        ln_weight: torch.Tensor,
        ln_bias: torch.Tensor,
        eps: float,
        fwd_ln_sm_margin: int,
        bwd_ln_sm_margin: int,
        zero_centered_gamma: bool,
        is_grad_enabled: bool,
        activation_dtype: torch.dtype,
    ) -> torch.Tensor:
        # Make sure input dimensions are compatible
        in_features = ln_weight.numel()
        assert inp.is_cuda, "TransformerEngine needs CUDA."
        permute = False
        if inp.shape[-1] != in_features:
            inp = inp.permute(0, 2, 3, 1)
            permute = True
        assert inp.shape[-1] == in_features, "LayerNorm not possible"
        if permute:
            inp = inp.permute(0, 3, 1, 2)
        inputmat = inp.reshape((-1, in_features))

        # Cast for native AMP
        inputmat = cast_if_needed(inputmat, activation_dtype)
        ln_weight = cast_if_needed(ln_weight, activation_dtype)
        ln_bias = cast_if_needed(ln_bias, activation_dtype)

        if is_grad_enabled:
            ln_out, mu, rsigma = tex.layernorm_fwd(inputmat, ln_weight,
                ln_bias, eps, fwd_ln_sm_margin, zero_centered_gamma)
            ctx.save_for_backward(inputmat, ln_weight, mu, rsigma)
            ctx.inp_shape = inp.shape
            ctx.bwd_ln_sm_margin = bwd_ln_sm_margin
            ctx.zero_centered_gamma = zero_centered_gamma
        else:
            ln_out, mu, rsigma = layernorm_fwd_inf(inputmat, ln_weight,
                ln_bias, eps, zero_centered_gamma), None, None
        return ln_out.view_as(inp)

Finally the training could work properly. But the speed is the same with bf16…

Experiments about ‘torchao’

‘torchao‘ is a python library that support PyTorch native quantization and sparsity for training and inference. I just finished some experiments/tests with it for my image-classification project, which use CNN model by PyTorch. Below are some conclusions.

My project already used Automatic Mixed Precision of ‘bfloat16’, but the convert_to_float8_training still easily reduce about 60% of the VRAM (on my RTX 4090 GPU):

from torchao.float8 import convert_to_float8_training

def module_filter_fn(mod: torch.nn.Module, fqn: str) -> bool:
    # Example: Exclude the output layer from float8 conversion
    if fqn == "output":
        return False
    # Example: Exclude linear layers with dimensions not divisible by 16
    if isinstance(mod, torch.nn.Linear):
        if mod.in_features % 16 != 0 or mod.out_features % 16 != 0:
            return False
    return True

convert_to_float8_training(m, module_filter_fn=module_filter_fn)



AdamW8bit could decrease the VRAM from 22.6GB to 22.4GB, not too much.

Didn't see any VRAM difference after using CPUOffloadOptimizer. Since it couldn't work well with learning-rate-scheduler. I tend to give up it.

Try to understand Variational Autoencoders

ELBO as the Loss Function

Note: “p(x|z)” means True Posterior, “q(z|x)” means Approximate Posterior

What if only use first term of the Loss?

What’s the meaning of “GAN tend to lack full support over the data”?

Why using Gaussian Distribution for latent variable?

How to change App name in Google Play Console?

Just one picture:

I don’t know why the editing place of App name is under “Grow users”. But unfortunately, it’s there.

After you change the “App name” and click “Save” (You also need to upload a bunch of images before click it. Damn it)

Resizing a image is not as easy as you think

I found a very interesting picture:

The size of this image is about 8MB although it’s blurring. Then I use below python code to try to resize it using different interpolation strategy:

import cv2

img = cv2.imread("/Users/robin/Downloads/ou1.png")

for inter, name in [(cv2.INTER_AREA, "area"), (cv2.INTER_CUBIC, "cubic"), (cv2.INTER_LINEAR, "linear")]:
    after = cv2.resize(img, (310, 310), inter)
    cv2.imwrite(f"{name}.bmp", after)

Only cv2.INTER_AREA works well.

Then let me try dart language:

import 'dart:io';
import 'package:image/image.dart' as img;

void main() async {
  // Load the image
  final file = File('/Users/robin/Downloads/ou1.png');
  final bytes = await file.readAsBytes();
  final image = img.decodeImage(bytes);

  if (image == null) {
    print("Failed to load image");
    return;
  }

  // Resize and save images with different methods
  for (var method in ['area', 'cubic', 'linear', 'average']) {
    img.Image resized;
    if (method == 'area' || method == 'linear') {
      // Use `copyResize` for these, `area` will behave similarly to default
      resized = img.copyResize(image, width: 310, height: 310);
    } else if (method == 'cubic') {
      // Simulate cubic interpolation (approximation)
      resized = img.copyResize(image, width: 310, height: 310, interpolation: img.Interpolation.cubic);
    } else if (method == 'average') {
      resized = img.copyResize(image, width: 310, height: 310, interpolation: img.Interpolation.average);
    } else {
      print("Unknown interpolation method: $method");
      continue;
    }

    // Save the resized image
    final output = File('$method.bmp');
    await output.writeAsBytes(img.encodePng(resized));
    print("Saved resized image using $method interpolation as $method.bmp");
  }
}

The result looks also terrible except just one strategy “average”:

So, better use “Interpolation.average” in dart language.

Fix the launching problem of Android Studio on Macbook

After I installed two versions of JDK (17 and 21) and uninstalled them, I saw this error when trying to launch my Android Studio. This error is hard to fix. Reinstalling Android Studio won’t fix it. And even after I searched google and asked chatGPT and try all the suggestion they give, the problem continued for two days.

The eventual solution is from a StackOverflow article (sorry I forget the url):

# firstly, manually uninstall Android Studio
# then
rm -rf /Library/Java/JavaVirtualMachines/*
# then
rm ~/Library/Application\ Support/Google/AndroidStudio*
# then
rm  -rf ~/Library/Caches/
# now install the Android Studio and it could be launched correctly

An experiment about my stupid idea

After training both image classification and sound classification deep learning models. I found out that the image training is much slower than the sound training, although the sound dataset is much bigger than image dataset.

The first idea that jumped out of my mind is that the image has 3 channels (RGB) but the sound spectrogram just have 1 channel. Therefore if I compress RGB into 1 channel (such as using gray image), the training speed of image classification will become 3 times faster.

A few days ago, I started to train the image classification with gray image. But the speed of training is almost the same with RGB image. Until then, I realized how stupid I am.

Let’s see below graph cut from residual network paper. Yes, the first layer is a 7×7 64 filters convoluation, and it will map no matter how many channels just to 64 filters. If the image is a gray one, it maps it to 64 filters; if the image is a RGB one, it also maps it to 64 filters. The computing cost will only reduce 3 times for first layer if I reduce the 3-channels to 1-channel. Compare to total computing cost, this change is quite minor.

That’s why there is no body mentioned about this “accelerating” technology before 🙂

How to unfold two Arrays in BigQuery

Imaing we have data like this:

WITH Sequences AS
  (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS prod_type, [1.1, 1.2, 2.1, 2.3, 3.3, 3.4] AS prod_price,
   UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS prod_type, [1.3, 4.2, 2.1, 7.3, 5.3, 9.4] AS prod_price,
   UNION ALL SELECT 3 AS id, [5, 10] AS prod_type, [1.8, 4.9, 2.0, 7.6, 5.1, 8.4] AS prod_price)
select * from sequences

How could I get the total price of each “prod_type” for every “id”?

First we need to unfold the “prod_type” and “prod_price” correspondingly:

WITH Sequences AS
  (SELECT 1 AS id, [0, 1, 1, 2, 3, 5] AS prod_type, [1.1, 1.2, 2.1, 2.3, 3.3, 3.4] AS prod_price,
   UNION ALL SELECT 2 AS id, [2, 4, 8, 16, 32] AS prod_type, [1.3, 4.2, 2.1, 7.3, 5.3, 9.4] AS prod_price,
   UNION ALL SELECT 3 AS id, [5, 10] AS prod_type, [1.8, 4.9, 2.0, 7.6, 5.1, 8.4] AS prod_price)
SELECT id, prod_type, prod_price
from
sequences,
unnest(prod_type) AS prod_type,
unnest(prod_price) AS prod_price;

and then use “group by” to calculate total price:

...
SELECT
  id,
  prod_type,
  SUM(prod_price)
FROM
  sequences,
  UNNEST(prod_type) AS prod_type,
  UNNEST(prod_price) AS prod_price
GROUP BY
  id,
  prod_type

“area”	“cubic”	“linear”	“average”