Evaluating Time Series Forecasting With Seasonal Arima Model In Python

Introduction

In this blog post, we will dive deep into the world of time series forecasting using the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. Time series data often exhibit seasonality or cyclical patterns, and SARIMA is a popular statistical technique in such scenarios. To get you started, we will demonstrate how to build, train, evaluate, and visualize a SARIMA model using Python.

Import Libraries

First, let's import the necessary libraries:

import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.seasonal import seasonal_decompose from statsmodels.tsa.statespace.sarimax import SARIMAX from sklearn.metrics import mean_squared_error

Load Time Series Data

In this example, we will use a dataset containing monthly mean surface air temperature from the National Oceanic and Atmospheric Administration (NOAA), available at https://www.ncdc.noaa.gov/cdo-web/datasets.

url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv" data = pd.read_csv(url, index_col='Date', parse_dates=True)

Exploratory Data Analysis (EDA)

Let's perform a quick EDA to understand the dataset's properties.

# Summary statistics print(data.describe()) # Plot time series data.plot(figsize=(10, 5)) plt.show() # Seasonal decomposition result = seasonal_decompose(data, model='additive') result.plot() plt.show()

SARIMA Model Training

We will split the dataset into a train and test set, then fit a SARIMA model to the training data using an example of hyperparameters (1,1,1)(0,1,0,12). You can fine-tune these parameters using a grid search if desired.

# Split data train, test = data[:-12], data[-12:] # Fit SARIMA model model = SARIMAX(train, order=(1, 1, 1), seasonal_order=(0, 1, 0, 12)) model_fit = model.fit() # Summary of the model print(model_fit.summary())

Model Evaluation and Forecasting

Evaluate the model by comparing the forecasts against the test dataset.

# Forecast forecast = model_fit.forecast(steps=len(test)) # Calculate mean squared error mse = mean_squared_error(test, forecast) print("MSE:", mse) # Plot the forecast plt.figure(figsize=(10, 5)) plt.plot(train.index, train, label="Train") plt.plot(test.index, test, label="Test") plt.plot(test.index, forecast, label="Forecast") plt.legend() plt.show()

Conclusion

In this blog post, we introduced time series forecasting using the SARIMA model. To summarize, we performed the following steps:

  1. Imported necessary libraries
  2. Loaded time series data
  3. Conducted exploratory data analysis
  4. Trained and evaluated a SARIMA model
  5. Made forecasts and plotted the results

After following these steps, you should now have a basic understanding of how to create and evaluate a SARIMA model in Python for time series forecasting. The skills you've learned here provide a foundation for further study and improvement on this topic, such as grid search for hyperparameter tuning, model selection, and forecasting with confidence intervals.