1

I have a time series data with the date and temperature records of a city. Following are my observations from the time series analysis:

  1. By plotting the graph of date vs temperature seasonality is observed.
  2. Performing adfuller test we find that the data is already stationary, so d=0.
  3. Perform Partial Autocorrelation and Autocorrelation with First Seasonal Difference and found p=2 and q=10 respectively.

Code to Train Model

model=sm.tsa.statespace.SARIMAX(df['temperature'],order=(1, 1, 1),seasonal_order=(2,0,10,12))
results=model.fit()

This fit function runs indefinitely and does not reach an output. I am running on a on Google Colab CPU.

How to fix this issue?

Subhawna
  • 11
  • 1
  • 3
  • Unless you have a really good reason for using stats models I suggest you to change to FB Prophet, it is an amazing and well documented library for advance time series forecasting. https://neuralprophet.com – Multivac Mar 07 '21 at 02:17
  • Statsmodels is very slow, you could try pmdarima: https://pypi.org/project/pmdarima/ or as suggested you could try fb prophet if you just want some time series fit reasonably well although it also can be quite slow when compared to other time series methods. – Tylerr Mar 07 '21 at 22:50

1 Answers1

-1

Assuming you have multiple cities in the dataframe. you can create some new features in the dataframe . For example , I created a few features below to try and match you PACF and ACF graphs .

df['lag_1'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(1))

d=1

df['d_1'] = df['temperature'] - df['lag_1']

p = 1:

df['p_1'] = df.groupby(['city'])['d_1'].transform(lambda x: x.shift(1))

q = 1:

df['ma_1'] = df.groupby(['city'])['d_1'].transform(lambda x: x.shift(1).rolling(1).mean())

P=2 (and other terms)

df['lag_t12'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(12))

df['lag_t24'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(24))

.

.

.

df['lag_t120'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(120))

Q=10 :

df['Q_10'] = df[col for col in df if col.startswith('lag_t')].mean()

After this try using LightGBM , XGBoost or other regression packages to regress against these newly created features with temprature as your target variable.

Alternatively , you can forego the ACF/PCF approach altogether and instead create bunch of commonly used features using :

  • shift
  • rolling mean
  • rolling standard deviations
  • max() , min() within groups

and regress against those and check which features minimize RMSE/AIC/BIC in your Regression Hyperparameters.

Since Cross validation is different in cases of Time Series,consider using TimeSeriesSplit in scikit-learn . Check-out this post in case you want to do grouped time series cross validation .