In the vast domain of data science, I have chosen to explore a very crucial yet often overlooked topic: Time Series Analysis using the ARIMA (AutoRegressive Integrated Moving Average) model. Time series analysis finds its roots in statistics and is extremely pertinent in the fields of finance, economics, social science, and many others.
ARIMA, a forecasting model, is typically used to understand and predict future points in the time series.
Time series is a collection of data points collected at constant time intervals. It is time-dependent. Along with an increasing or decreasing trend, most time series have some form of seasonality.
ARIMA, abbreviating AutoRegressive Integrated Moving Average, is a combination of 3 parts: AR (Autoregression), I (Integrated), and MA (Moving Average). A noteworthy point about ARIMA is that it’s applied to stationary time series only.
For practical illustration, let's implement time series forecasting using ARIMA on Python.
pip install pandas matplotlib pmdarima
import pandas as pd import matplotlib.pyplot as plt from pmdarima import auto_arima from sklearn.metrics import mean_squared_error, mean_absolute_error import math
df = pd.read_csv('AirPassengers.csv') df.head()
plt.figure(figsize=(10, 6)) plt.plot(df.index, df['passengers']) plt.xlabel('Month', fontsize=12) plt.ylabel('Passengers', fontsize=12) plt.title('Passengers Over Time', fontsize=15) plt.show()
model = auto_arima(df['passengers'], trace=True, error_action='ignore', suppress_warnings=True) model.fit(df['passengers'])
forecast = model.predict(n_periods=len(valid)) forecast = pd.DataFrame(forecast, index=valid.index, columns=['Prediction']) #calculate rmse and mae mse = mean_squared_error(valid['passengers'], forecast) print("MSE: ", mse) mae = mean_absolute_error(valid['passengers'], forecast) print("MAE: ", mae) rmse = math.sqrt(mean_squared_error(valid['passengers'], forecast)) print("RMSE: ", rmse)
While the focus has been on ARIMA, a variety of other models may be more suitable depending on the data and problem. However, the concepts and techniques explored are a necessary component of any data scientist's toolkit, especially in the field of time series analysis.
Happy Learning!