How to evaluate time-series model on a longer validation period than the forecast horizon

Question

I'm training ARIMA and Random forest models to forecast on a horizon of 3 days, but need to evaluate models using one full year.

What's the proper way to handle the following problems:

I've noticed most librairies don't have options to set a validation larger than the forecast horizon, is there a reason?
365 days is not a multiple of my forecast horizon (3 days). Is it a problem?
The librairies I'm using compute the RMSE for one horizon. If a do the RMSE average of on all horizons, I don't get the RMSE of the full year since RMSE is not a linear metric. Should I stop using the RMSE computed by the libraries?
When evaluating my models, should I consider every day as a forecast day, or only forecasting every forecast horizon (i.e. every 3 days here)

if I understood your problem, the fact of evaluating your '3-days ahead forecaster' on a full year should not be a problem; you can use techniques like walk-forward validation with ARIMA to make this validation (I can reply in an detailed answer in case this solves your issue) — German C M, Aug 22 '22 at 11:31

score 2 · Answer 1 · answered Aug 17 '22 at 13:10

There are 2 popular options when forecasting a time series:

Auto-regressive (AR in ARIMA). This approach use all available history to predict next time point. In order to predict further into the future one uses the model's predictions as features:

$$ \hat{x}_1 = f(x_{-5},\ldots,x_{0}) $$

$$ \hat{x}_2 = f(x_{-4}, \ldots, \hat{x}_1) $$

$$ \hat{x}_3 = f(x_{-3}, \ldots, \hat{x}_2) $$

and so on. This approach has a downside of accumulating prediction errors over time.

Another approach is to make a prediction to exact prediction horizon you are interested in. E.g., if you want to predict what will be the temperature in 24 hours, you train the model to predict exactly that (ommiting predictions 1 hours into the future, 2 hours into the future and so on). The downside of this approach is that you do not use the closest available temporal information from the training data, because during inference that information will not be available (it will still be "the future").

Both approaches are used on practice, sometimes in combination, predicting a short time interval using AR models and predicting further into the future (which is corrected later, as information became available, since future become present ^_^) using non-AR models.

score 2 · Answer 2 · edited Mar 11 '23 at 12:34

Please note that you should ask one question at a time, but as your questions are connected to each other, it is ok.

I've noticed most libraries don't have options to set a validation larger than the forecast horizon, is there a reason?

There might be confusion between forecast horizon and validation: both should be connected actually.

The validation is larger to see if the model is working well or not predicting 3 days ahead. For instance, if you train your model on 600 days predicting 3 days, it will be trained each day (using also the N previous ones) to predict the next 3 ones. Then, the validation is on 30 days and it will apply the same 3 days horizon prediction every day to see whether the predictions are valid or not.

365 days is not a multiple of my forecast horizon (3 days). Is it a problem?

Not at all: the more data you have to train, the better it is, but you have to define the forecast horizon.

Here is a code example to predict 3 days with ARIMA in Python:

from pandas import read_csv
from statsmodels.tsa.arima.model import ARIMA
import numpy
create a differenced series
def difference(dataset, interval=1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return numpy.array(diff)
invert differenced value
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]
load dataset
series = read_csv('dataset.csv', header=0)
seasonal difference
X = series.values
days_in_year = 365
differenced = difference(X, days_in_year)
fit model
model = ARIMA(differenced, order=(7,0,1))
model_fit = model.fit()
multi-step out-of-sample forecast
forecast = model_fit.forecast(steps=3)
invert the differenced forecast to something usable
history = [x for x in X]
day = 1
for yhat in forecast:
    inverted = inverse_difference(history, yhat, days_in_year)
    print('Day %d: %f' % (day, inverted))
    history.append(inverted)
    day += 1

See source

The libraries I'm using compute the RMSE for one horizon. If a do the RMSE average of on all horizons, I don't get the RMSE of the full year since RMSE is not a linear metric. Should I stop using the RMSE computed by the libraries?

RMSE is a good option and it could be used together with the Aikake Information Criteria to evaluate the model quality.

How to evaluate time-series model on a longer validation period than the forecast horizon

2 Answers2

create a differenced series

invert differenced value

load dataset

seasonal difference

fit model

multi-step out-of-sample forecast

invert the differenced forecast to something usable