In this blog post, we will explore the use of the Autoregressive Integrated Moving Average (ARIMA) model for analyzing stock prices. The ARIMA model is a popular model used in time series forecasting, especially for financial data.
ARIMA stands for Autoregressive Integrated Moving Average. It is a combination of three different modeling techniques:
The ARIMA model is represented by the parameters (p,d,q), where:
Let's start by importing the necessary libraries.
import pandas as pd import numpy as np import matplotlib.pyplot as plt from statsmodels.tsa.arima.model import ARIMA from sklearn.metrics import mean_squared_error
We will use the historical stock price data of Apple Inc. (AAPL) from Yahoo Finance for this analysis. You can download the dataset from here.
Load the data and visualize it.
data = pd.read_csv('AAPL.csv', index_col='Date', parse_dates=True) plt.figure(figsize=(12, 6)) plt.plot(data['Close']) plt.xlabel('Date') plt.ylabel('Stock Price') plt.title('Apple Inc. Stock Prices') plt.show()
To use the ARIMA model, we first need to make the time series stationary. We will do this by differencing the data. Let's use first-order differencing.
data_diff = data['Close'].diff(1).dropna()
Now, let's choose the order (p) and the number of moving average terms (q) using the Autocorrelation Function (ACF) plot and the Partial Autocorrelation Function (PACF) plot.
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf plt.figure() plt.subplot(211) plot_acf(data_diff, ax=plt.gca(), lags=20) plt.subplot(212) plot_pacf(data_diff, ax=plt.gca(), lags=20) plt.show()
From the plots, it appears that an ARIMA(1,1,1) model would be appropriate for our data.
Split the dataset into training and testing sets.
train_size = int(len(data_diff) * 0.8) train_data, test_data = data_diff[:train_size], data_diff[train_size:]
Now, let's fit the ARIMA model on the training set and evaluate its performance on the test set.
model = ARIMA(train_data, order=(1, 1, 1)) model_fit = model.fit() # Predict the test data predictions = model_fit.predict(start=len(train_data), end=len(train_data) + len(test_data) - 1, dynamic=False) # Calculate the Mean Squared Error (MSE) mse = mean_squared_error(test_data, predictions) print('Mean Squared Error (MSE):', mse)
In this blog post, we used the ARIMA model to analyze the stock prices of Apple Inc. We made the time series stationary, chose the appropriate order and number of terms, and evaluated the model's performance on a test set. The ARIMA model can be a valuable tool for predicting and analyzing financial data in the field of data science.