Exploring Time Series Analysis With Arima In Python

In the vast domain of data science, I have chosen to explore a very crucial yet often overlooked topic: Time Series Analysis using the ARIMA (AutoRegressive Integrated Moving Average) model. Time series analysis finds its roots in statistics and is extremely pertinent in the fields of finance, economics, social science, and many others.

ARIMA, a forecasting model, is typically used to understand and predict future points in the time series.

Brief About Time Series

Time series is a collection of data points collected at constant time intervals. It is time-dependent. Along with an increasing or decreasing trend, most time series have some form of seasonality.

How ARIMA model works

ARIMA, abbreviating AutoRegressive Integrated Moving Average, is a combination of 3 parts: AR (Autoregression), I (Integrated), and MA (Moving Average). A noteworthy point about ARIMA is that it’s applied to stationary time series only.

  • The AR part is correlation between prev & current terms.
  • The I part is differencing to make time-series stationary.
  • MA part is dependency between an observation & residual error from a moving average model applied to lagged observations.

Stepwise into Python

For practical illustration, let's implement time series forecasting using ARIMA on Python.

  1. First, Install the necessary libraries.
pip install pandas matplotlib pmdarima
  1. Import the libraries.
import pandas as pd import matplotlib.pyplot as plt from pmdarima import auto_arima from sklearn.metrics import mean_squared_error, mean_absolute_error import math
  1. Import the dataset.
df = pd.read_csv('AirPassengers.csv') df.head()
  1. Visualize the time series to see how it looks.
plt.figure(figsize=(10, 6)) plt.plot(df.index, df['passengers']) plt.xlabel('Month', fontsize=12) plt.ylabel('Passengers', fontsize=12) plt.title('Passengers Over Time', fontsize=15) plt.show()
  1. Fit the ARIMA model.
model = auto_arima(df['passengers'], trace=True, error_action='ignore', suppress_warnings=True) model.fit(df['passengers'])
  1. Predict the future and calculate errors.
forecast = model.predict(n_periods=len(valid)) forecast = pd.DataFrame(forecast, index=valid.index, columns=['Prediction']) #calculate rmse and mae mse = mean_squared_error(valid['passengers'], forecast) print("MSE: ", mse) mae = mean_absolute_error(valid['passengers'], forecast) print("MAE: ", mae) rmse = math.sqrt(mean_squared_error(valid['passengers'], forecast)) print("RMSE: ", rmse)

While the focus has been on ARIMA, a variety of other models may be more suitable depending on the data and problem. However, the concepts and techniques explored are a necessary component of any data scientist's toolkit, especially in the field of time series analysis.

Happy Learning!