From a nice hint here, and the docs here:
When you use pandas + HDF5 storage it is convenient to generate one table that is the 'selector' table that you use to index which rows you will select. Then you use that to retrieve the bulk data from separate tables which have the same index.
Originally I was appending columns to the main table, but there is no efficient way of doing that when using HDF5 (appending rows is efficient). Now I'm just creating new tables for the data, keeping the original index.
When you use pandas + HDF5 storage it is convenient to generate one table that is the 'selector' table that you use to index which rows you will select. Then you use that to retrieve the bulk data from separate tables which have the same index.
Originally I was appending columns to the main table, but there is no efficient way of doing that when using HDF5 (appending rows is efficient). Now I'm just creating new tables for the data, keeping the original index.
Comments
Post a Comment