Evaluating Time Series Forecasting With Seasonal Arima Model In Python

Introduction

In this blog post, we will dive deep into the world of time series forecasting using the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. Time series data often exhibit seasonality or cyclical patterns, and SARIMA is a popular statistical technique in such scenarios. To get you started, we will demonstrate how to build, train, evaluate, and visualize a SARIMA model using Python.

Import Libraries

First, let's import the necessary libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.statespace.sarimax import SARIMAX
from sklearn.metrics import mean_squared_error

Load Time Series Data

In this example, we will use a dataset containing monthly mean surface air temperature from the National Oceanic and Atmospheric Administration (NOAA), available at https://www.ncdc.noaa.gov/cdo-web/datasets.

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv"
data = pd.read_csv(url, index_col='Date', parse_dates=True)

Exploratory Data Analysis (EDA)

Let's perform a quick EDA to understand the dataset's properties.

# Summary statistics
print(data.describe())

# Plot time series
data.plot(figsize=(10, 5))
plt.show()

# Seasonal decomposition
result = seasonal_decompose(data, model='additive')
result.plot()
plt.show()

SARIMA Model Training

We will split the dataset into a train and test set, then fit a SARIMA model to the training data using an example of hyperparameters (1,1,1)(0,1,0,12). You can fine-tune these parameters using a grid search if desired.

# Split data
train, test = data[:-12], data[-12:]

# Fit SARIMA model
model = SARIMAX(train, order=(1, 1, 1), seasonal_order=(0, 1, 0, 12))
model_fit = model.fit()

# Summary of the model
print(model_fit.summary())

Model Evaluation and Forecasting

Evaluate the model by comparing the forecasts against the test dataset.

# Forecast
forecast = model_fit.forecast(steps=len(test))

# Calculate mean squared error
mse = mean_squared_error(test, forecast)
print("MSE:", mse)

# Plot the forecast
plt.figure(figsize=(10, 5))
plt.plot(train.index, train, label="Train")
plt.plot(test.index, test, label="Test")
plt.plot(test.index, forecast, label="Forecast")
plt.legend()
plt.show()

Conclusion

In this blog post, we introduced time series forecasting using the SARIMA model. To summarize, we performed the following steps:

Imported necessary libraries
Loaded time series data
Conducted exploratory data analysis
Trained and evaluated a SARIMA model
Made forecasts and plotted the results

After following these steps, you should now have a basic understanding of how to create and evaluate a SARIMA model in Python for time series forecasting. The skills you've learned here provide a foundation for further study and improvement on this topic, such as grid search for hyperparameter tuning, model selection, and forecasting with confidence intervals.