3

Looks like Pandas doesn't translate Pandas timedelta to Parquet INTERVAL:

>>> import pandas as pd
>>> df = pd.DataFrame([{'seconds': 30}])
>>> df.to_parquet('/tmp/test.parquet') # so far so good
>>> df['duration'] = pd.to_timedelta(df.seconds, unit='seconds')
>>> df.to_parquet('/tmp/test.parquet')
Traceback (most recent call last):
  / ... /
pyarrow.lib.ArrowNotImplementedError: Unhandled type for Arrow to Parquet schema conversion: duration[ns]
>>> 

Is this simply a missing feature? Am I wrong to expect timedelta to be saved as INTERVAL? Which format would you recommend if my dataframe is quite large (500mb) but I'll read it back to Pandas - .to_pickle()?

Yaniv Aknin
  • 143
  • 5

1 Answers1

0

yes missing feature. See https://issues.apache.org/jira/browse/ARROW-6780

you could try engine=fastparquet

Ray Bell
  • 116
  • 1