In this blog post, we will dive deep into the world of time series forecasting using the Seasonal Autoregressive Integrated Moving Average (SARIMA) model. Time series data often exhibit seasonality or cyclical patterns, and SARIMA is a popular statistical technique in such scenarios. To get you started, we will demonstrate how to build, train, evaluate, and visualize a SARIMA model using Python.
First, let's import the necessary libraries:
import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.seasonal import seasonal_decompose from statsmodels.tsa.statespace.sarimax import SARIMAX from sklearn.metrics import mean_squared_error
In this example, we will use a dataset containing monthly mean surface air temperature from the National Oceanic and Atmospheric Administration (NOAA), available at https://www.ncdc.noaa.gov/cdo-web/datasets.
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-min-temperatures.csv" data = pd.read_csv(url, index_col='Date', parse_dates=True)
Let's perform a quick EDA to understand the dataset's properties.
# Summary statistics print(data.describe()) # Plot time series data.plot(figsize=(10, 5)) plt.show() # Seasonal decomposition result = seasonal_decompose(data, model='additive') result.plot() plt.show()
We will split the dataset into a train and test set, then fit a SARIMA model to the training data using an example of hyperparameters (1,1,1)(0,1,0,12). You can fine-tune these parameters using a grid search if desired.
# Split data train, test = data[:-12], data[-12:] # Fit SARIMA model model = SARIMAX(train, order=(1, 1, 1), seasonal_order=(0, 1, 0, 12)) model_fit = model.fit() # Summary of the model print(model_fit.summary())
Evaluate the model by comparing the forecasts against the test dataset.
# Forecast forecast = model_fit.forecast(steps=len(test)) # Calculate mean squared error mse = mean_squared_error(test, forecast) print("MSE:", mse) # Plot the forecast plt.figure(figsize=(10, 5)) plt.plot(train.index, train, label="Train") plt.plot(test.index, test, label="Test") plt.plot(test.index, forecast, label="Forecast") plt.legend() plt.show()
In this blog post, we introduced time series forecasting using the SARIMA model. To summarize, we performed the following steps:
After following these steps, you should now have a basic understanding of how to create and evaluate a SARIMA model in Python for time series forecasting. The skills you've learned here provide a foundation for further study and improvement on this topic, such as grid search for hyperparameter tuning, model selection, and forecasting with confidence intervals.