UPDATE: Fixed confusion between 'table' and 'store'
UPDATE: Include note about how to set data columns
The basic steps are these
- Use table=True in .put or .to_hdf to indicate that you want the data stored as a frame_table that allows on-disk selection and partial retrieval
- Use data_columns= [...] during saving to identify which columns should be used to select data
- If you do not use table=True you will get
TypeError: cannot pass a where specification when reading from a non-table this store must be selected in its entirety
- If you do not declare data_columns you will get
ValueError: query term is not valid [field->...,op->...,value->...]
import pandas as pd store = pd.HDFStore('filename.h5') df = pd.DataFrame( ... ) #Construct some dataframe #Save as a frame_table in filename.h5 and declare some data columns
#append creates a table automatically
store.append('data1', df, data_columns=[...]) # df = pd.DataFrame( ... ) #Construct another dataframe
#Put requires an explicit instruction to create a table
store.put('data2', df, table=True, data_columns=[...]) #This is convenient - it now adds a second node to the file
Now you can use the battery of select methods (outlined here) to load just selected parts of the data structures.