Yesterday I donated the arrayterator code to the Numpy project, and today I got commit access to the Scipy svn! I’m rewriting my old pupynere module, which was incorporated some time ago into Scipy as scipy.io.netcdf (“pupynere” is so more fun to say!).
I already fixed the code that handles record arrays, since the old code would read everything into memory. The problem is that if you have two records variables called “A” and “B”, they will be stored in disk as:
A(0,1) A(0,2) A(0,3) B(0,1) A(1,1) A(1,2) A(1,3) B(1,1)
The trick is how to map data like this in the disk to Numpy arrays without reading it to memory, ie, using mmap. In this example we can do this by creating an array with the complex dtype
dtype([('A', '>i4', (3,)), ('B', '>i4')])
We can then access the data directly from disk by referencing array['A'], eg. Pretty cool, and efficient too!