In the training process, I need to read array data from
.npy file and get a part of it:
import numpy as np data = np.load("sample1.npy") sound1 = data[start1: end1] sound2 = data[start2: end2]
.npy files are large, it became slowly to read a large file but only get some small parts of it. Is there a simple way to let us only read these small parts?
Yes, it’s just a simple line change
import numpy as np data = np.load("sample1.npy", mmap_mode="r") # just change this line sound1 = data[start1: end1] sound2 = data[start2: end2]
load() function will not generate any IO for disk. But when we process the segment (like
sound2), it will load and only load the pages that contains the segment, which decrease the total IO tremendously.
After I change this line of code, the reading bandwidth dropped from 300MB/s to less than 100MB/s