Looks like Pandas doesn't translate Pandas timedelta to Parquet INTERVAL:
>>> import pandas as pd
>>> df = pd.DataFrame([{'seconds': 30}])
>>> df.to_parquet('/tmp/test.parquet') # so far so good
>>> df['duration'] = pd.to_timedelta(df.seconds, unit='seconds')
>>> df.to_parquet('/tmp/test.parquet')
Traceback (most recent call last):
/ ... /
pyarrow.lib.ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: duration[ns]
>>>
Is this simply a missing feature? Am I wrong to expect timedelta
to be saved as INTERVAL
? Which format would you recommend if my dataframe is quite large (500mb) but I'll read it back to Pandas - .to_pickle()
?