A cool thing about data arrays stored in HDF5 via h5py is that you can incremental add data to them. This is the practical way of processing large data sets: you read in large datasets piece by piece, process them and then append the processed data to an array on disk. An interesting thing about this is that there is a small size overhead in the saved file associated with this resizing compared to the same data saved all at once with no resizing of the HDF5 datasets. I did the computations by using block sizes of [10, 100, 1000, 10000] elements and block counts of [10,100,1000, 10000] The corresponding matrix of overhead (excess bytes needed for the resized version over the directly saved version) looks like this (rows are for the block sizes, columns are for the block counts): overhead -> array([[ 9264, 2064, 3792, 9920], [ 2064, 3792, 9920, 52544], [ 3792, 9920, 52544, 462744], [ 9920, 52544, 462744, 4570320]]) As...