Wednesday, August 7, 2024

Residual Plot in ARIMA Model

A residual plot is a graphical representation of the difference between the actual values of a time series and the values predicted by a model. In the context of ARIMA models, it helps assess the model's performance and identify potential issues.   

Key Characteristics of a Good Residual Plot:

Randomness: The residuals should appear as random noise without any discernible patterns.   

Mean of zero: The residuals should have a mean close to zero, indicating that the model is unbiased.   

Constant variance: The spread of residuals should be consistent over time (homoscedasticity).

Normality: The residuals should follow a normal distribution.

How to Create a Residual Plot:

Python

import matplotlib.pyplot as plt

# Assuming you have a fitted ARIMA model called 'model_fit' and the original data 'data'

residuals = model_fit.resid


# Plot the residuals

residuals.plot(kind='line')

plt.title('Residual Plot')

plt.show()

Use code with caution.


Interpreting the Residual Plot:

Patterns: If the residuals exhibit patterns (e.g., trends, seasonality, or autocorrelation), it indicates that the model has not captured all the information in the data.

Outliers: Large outliers in the residuals might suggest influential data points or model misspecification.

Heteroscedasticity: If the variance of the residuals changes over time, it suggests that the model's error structure is not constant.

Additional Diagnostic Plots:

ACF and PACF plots of residuals: To check for autocorrelation in the residuals.

Histogram of residuals: To assess the normality assumption.

QQ plot: To visually compare the distribution of residuals to a normal distribution.

By analyzing the residual plot and other diagnostic plots, you can evaluate the adequacy of your ARIMA model and make necessary adjustments.

A sample code is like below 


import pandas as pd
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.tsa.arima.model import ARIMA
from pandas import DataFrame

# Sample data with high autocorrelation (replace with your data)
# data = {'Timestamp(UTC)': ['12/15/23 19:20', '12/15/23 19:30','12/15/23 19:40','12/15/23 19:50','12/15/23 20:00','12/15/23 20:10','12/15/23 20:20','12/15/23 20:30','12/15/23 20:40'],
# 'Traffic_Measure': [10, 15, 10, 5, 50, 20, 56, 89, 23]}

# data = {'Timestamp(UTC)': ['12/15/23 19:20', '12/15/23 19:30','12/15/23 19:40','12/15/23 19:50','12/15/23 20:00','12/15/23 20:10','12/15/23 20:20','12/15/23 20:30','12/15/23 20:40'],
# 'Traffic_Measure': [10, 10.5, 10.5, 10.9, 10.1, 10.1, 10.5, 10.2, 10.1]}

data = {'Timestamp(UTC)': ['12/15/23 19:20', '12/15/23 19:30','12/15/23 19:40','12/15/23 19:50','12/15/23 20:00','12/15/23 20:10','12/15/23 20:20','12/15/23 20:30','12/15/23 20:40'],
'Traffic_Measure': [10, 15, 20, 25, 30, 35, 40, 45, 50]}

df = pd.DataFrame(data)
date_time = pd.to_datetime(df.pop('Timestamp(UTC)'), format='%m/%d/%y %H:%M')
df.index = date_time
# Plot autocorrelation
# plot_acf(df['Traffic_Measure'])

# Fit an ARIMA model (adjust p, d, q based on ACF and PACF)
model = ARIMA(df, order=(5, 1, 0)) # Example order
model_fit = model.fit()

# Make predictions
predictions = model_fit.forecast(steps=12) # Predict 12 future values
print("predictions are ",predictions)

df_preds = pd.DataFrame({'Traffic_Measure':predictions.values})
df_preds.index = predictions.index

print("df_preds", df_preds.head())

df['Traffic_Measure'].plot(label='Actual', color='red')
df_preds['Traffic_Measure'].plot(label='Predictions', color='blue')


print(df.head())
print(df_preds.head())


residuals = DataFrame(model_fit.resid)
residuals.plot(kind='kde')
print(residuals.describe())






No comments:

Post a Comment