Assuming you have multiple cities in the dataframe. you can create some new features in the dataframe . For example , I created a few features below to try and match you PACF and ACF graphs .
df['lag_1'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(1))
d=1
df['d_1'] = df['temperature'] - df['lag_1']
p = 1:
df['p_1'] = df.groupby(['city'])['d_1'].transform(lambda x: x.shift(1))
q = 1:
df['ma_1'] = df.groupby(['city'])['d_1'].transform(lambda x: x.shift(1).rolling(1).mean())
P=2 (and other terms)
df['lag_t12'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(12))
df['lag_t24'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(24))
.
.
.
df['lag_t120'] = df.groupby(['city'])['temperature'].transform(lambda x: x.shift(120))
Q=10 :
df['Q_10'] = df[col for col in df if col.startswith('lag_t')].mean()
After this try using LightGBM , XGBoost or other regression packages to regress against these newly created features with temprature
as your target variable.
Alternatively , you can forego the ACF/PCF approach altogether and instead create bunch of commonly used features using :
- shift
- rolling mean
- rolling standard deviations
- max() , min() within groups
and regress against those and check which features minimize RMSE/AIC/BIC in your Regression Hyperparameters.
Since Cross validation is different in cases of Time Series,consider using TimeSeriesSplit in scikit-learn . Check-out this post in case you want to do grouped time series cross validation .