Analyzing Stock Prices With Arima Model In Python

Introduction

In this blog post, we will explore the use of the Autoregressive Integrated Moving Average (ARIMA) model for analyzing stock prices. The ARIMA model is a popular model used in time series forecasting, especially for financial data.

ARIMA Model

ARIMA stands for Autoregressive Integrated Moving Average. It is a combination of three different modeling techniques:

Autoregressive (AR): A linear regression model that predicts the value of a time series using one or more previous time steps.
Integrated (I): The process of differencing the time series to make it stationary. Stationarity means that the mean, variance, and covariance of the series are constant over time.
Moving Average (MA): A model that predicts the value of a time series using past errors.

The ARIMA model is represented by the parameters (p,d,q), where:

p: The number of autoregressive (AR) terms
d: The order of differencing (I)
q: The number of moving average (MA) terms

Setup and Load the Data

Let's start by importing the necessary libraries.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error

We will use the historical stock price data of Apple Inc. (AAPL) from Yahoo Finance for this analysis. You can download the dataset from here.

Load the data and visualize it.

data = pd.read_csv('AAPL.csv', index_col='Date', parse_dates=True)
plt.figure(figsize=(12, 6))
plt.plot(data['Close'])
plt.xlabel('Date')
plt.ylabel('Stock Price')
plt.title('Apple Inc. Stock Prices')
plt.show()

Data Preprocessing and Model Selection

To use the ARIMA model, we first need to make the time series stationary. We will do this by differencing the data. Let's use first-order differencing.

data_diff = data['Close'].diff(1).dropna()

Now, let's choose the order (p) and the number of moving average terms (q) using the Autocorrelation Function (ACF) plot and the Partial Autocorrelation Function (PACF) plot.

from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

plt.figure()
plt.subplot(211)
plot_acf(data_diff, ax=plt.gca(), lags=20)
plt.subplot(212)
plot_pacf(data_diff, ax=plt.gca(), lags=20)
plt.show()

From the plots, it appears that an ARIMA(1,1,1) model would be appropriate for our data.

Train and Test the Model

Split the dataset into training and testing sets.

train_size = int(len(data_diff) * 0.8)
train_data, test_data = data_diff[:train_size], data_diff[train_size:]

Now, let's fit the ARIMA model on the training set and evaluate its performance on the test set.

model = ARIMA(train_data, order=(1, 1, 1))
model_fit = model.fit()

# Predict the test data
predictions = model_fit.predict(start=len(train_data), end=len(train_data) + len(test_data) - 1, dynamic=False)

# Calculate the Mean Squared Error (MSE)
mse = mean_squared_error(test_data, predictions)
print('Mean Squared Error (MSE):', mse)

Conclusion

In this blog post, we used the ARIMA model to analyze the stock prices of Apple Inc. We made the time series stationary, chose the appropriate order and number of terms, and evaluated the model's performance on a test set. The ARIMA model can be a valuable tool for predicting and analyzing financial data in the field of data science.