Commit 68998af
committed
fixing NDArray save method to use its own compression parameters
The 'NDArray.save' method does not path 'cparams' to the parent class 'copy'
consequently array is reprocessed with the default 'cparams'.
To observe the bug run the script below.
=================================================================
import os
import blosc2
import numpy as np
a = np.arange(10_000_000)
a = a * a
cparams = blosc2.CParams(
codec=blosc2.Codec.ZSTD,
clevel=9,
filters=[blosc2.Filter.BITSHUFFLE],
)
ba = blosc2.asarray(a, cparams=cparams)
print(f"Blosc2 memory: size {ba.cbytes}\t cratio: {ba.cratio}")
outdir = "cache/"
prefix = "save"
fname = outdir + prefix + ".b2nd"
ba.save(fname, mode="w")
fsize = os.path.getsize(fname)
print(f"Blosc2 array save:\t saved file size = {fsize}\t cratio: {a.nbytes/fsize}")
=================================================================
You should see
~~~~~
Blosc2 memory: size 4370284 cratio: 18.74477722729232
Blosc2 array save: saved file size = 12370369 cratio: 6.467066584675041
~~~~~
ba.cbytes ---> 4370284
ba.cratio ---> 18.3
I.e. the array in memory has 'cbytes=4370284',
however the saved file has 12370369 bytes, which gives 'cratio' of about 6.5.
Also if the array is loaded back from the file, it is easy to see
that 'cparams' are different from the original array and changed to the
default one.
After the patch, the memory 'cbytes' are closely matching the saved file size
as well as 'cparams'
~~~~~
Blosc2 memory: size 4370284 cratio: 18.74477722729232
Blosc2 array save: saved file size = 4370051 cratio: 18.30642251085856
~~~~~1 parent eee6997 commit 68998af
1 file changed
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4703 | 4703 | | |
4704 | 4704 | | |
4705 | 4705 | | |
4706 | | - | |
| 4706 | + | |
4707 | 4707 | | |
4708 | 4708 | | |
4709 | 4709 | | |
| |||
0 commit comments