This is something that I’ve wanted to program for a long time: a lazy version of Numpy’s concatenate function, which just acts as a proxy for the data.
class LazyConcatenator(object): """ A lazy concatenator. This class is similar to the ``concatenate`` function from Numpy, except that the new object is a view of the concatenated arrays, and data is only read when this object is sliced. """ def __init__(self, arrays, axis=None): if axis is None: # Create a new axis self.shape = (len(arrays),) + arrays[0].shape self._arrays = asarray(arrays, 'O') else: # 'cat along a given axis shape = list(arrays[0].shape) shape[axis] = sum([array.shape[axis] for array in arrays]) self.shape = tuple(shape) self._arrays = asarray( [LazySlice((slice(None),)*axis+(i,), arrays[j]) for j in range(len(arrays)) for i in range(arrays[j].shape[axis])], 'O') self._axis = axis def __getitem__(self, index): # Extract the slice for the object array index = fix_slice(index, self.shape) if self._axis is None: select, index = index[0], index[1:] out = asarray( [obj[index] for obj in self._arrays[select]]) else: select = index[self._axis] out = concatenate( [obj[index] for obj in self._arrays[select]], axis=self._axis) return out
It works by storing the arrays in an object array, that works as a new axis or replaces an existing one. Creating a new axis is the easy case: just put the arrays in the object array, and when the object is sliced use the first slice for the object array and the rest for the stored data.
Concatenating the data on an existing array is more complicated. My solution is to define a LazySlice, so that each element in the object array points to a single element in the corresponding axis on the stored arrays — the LazySlice is, as the name suggests, a way to slice a variable without actually slicing it, only after a second slice is applied. Here’s the implementation:
class LazySlice(object): """ A lazy slice. This is a lazy slice. The initial slice is applied only when the object is sliced again; both slices are then combined and applied to the data. """ def __init__(self, slice, obj): self.slice = fix_slice(slice, obj.shape) self.obj = obj def __getitem__(self, index): index = fix_slice(index, self.obj.shape) return self.obj[combine_slices(index, self.slice)]
The functions fix_slice and combine_slices come from Pydap. The first ensures that a multidimensional slice has the proper shape, and also fixes negatives indexes. The second one combines two slices in a single one.