-- Living Mobile --: What is SHAP? How it can be used for Linear Regression?

**SHAP** (SHapley Additive exPlanations) is a unified framework for interpreting model predictions based on cooperative game theory. For linear regression, it provides a mathematically elegant way to explain predictions.

---

## **How SHAP Works for Linear Regression**

### **Basic Concept:**

SHAP values distribute the "credit" for a prediction among the input features fairly, based on their marginal contributions.

### **For Linear Models:**

In linear regression: \( y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n \)

The **SHAP value** for feature \( i \) is:

\phi_i = \beta_i (x_i - \mathbb{E}[x_i])

Where:

- \( \beta_i \) = regression coefficient for feature \( i \)

- \( x_i \) = feature value for this specific observation

- \( \mathbb{E}[x_i] \) = expected (average) value of feature \( i \) in the dataset

---

## **Key Properties**

### **1. Additivity**

\sum_{i=1}^n \phi_i = \hat{y} - \mathbb{E}[\hat{y}]

The sum of all SHAP values equals the difference between the prediction and the average prediction.

### **2. Efficiency**

All the prediction is distributed among features - no "lost" explanation.

### **3. Symmetry & Fairness**

Features with identical effects get identical SHAP values.

---

## **Example**

Suppose we have a linear model:

\text{Price} = 10 + 5 \times \text{Size} + 3 \times \text{Bedrooms}

Dataset averages: Size = 2, Bedrooms = 3, Average Price = 31

For a house with:

- Size = 4, Bedrooms = 2

- Predicted Price = \( 10 + 5\times4 + 3\times2 = 36 \)

**SHAP values:**

- ϕ_Size = \( 5 \times (4 - 2) = 10 \)

- ϕ_Bedrooms = \( 3 \times (2 - 3) = -3 \)

- ϕ_Baseline = 31 (average prediction)

**Verification:** 31 + 10 - 3 = 38 (slight adjustment for intercept)

---

## **Benefits for Linear Regression**

### **1. Unified Feature Importance**

- Shows how much each feature contributed to a specific prediction

- Unlike coefficients, SHAP values are prediction-specific

### **2. Directional Impact**

- Positive SHAP value → Feature increased the prediction

- Negative SHAP value → Feature decreased the prediction

### **3. Visualization**

- **SHAP summary plots**: Show feature importance across all predictions

- **Force plots**: Explain individual predictions

- **Dependence plots**: Show feature effects

---

## **Comparison with Traditional Interpretation**

| **Traditional** | **SHAP Approach** |

|-----------------|-------------------|

| Coefficient βᵢ | SHAP value ϕᵢ |

| Global effect | Local + Global effects |

| "One size fits all" | Prediction-specific explanations |

| Hard to compare scales | Comparable across features |

---

## **Practical Usage**

```python

import shap

import numpy as np

from sklearn.linear_model import LinearRegression

# Fit linear model

model = LinearRegression().fit(X, y)

# Calculate SHAP values

explainer = shap.Explainer(model, X)

shap_values = explainer(X)

# Visualize

shap.summary_plot(shap_values, X)

shap.plots.waterfall(shap_values[0]) # Explain first prediction

```

---

## **Why Use SHAP for Linear Regression?**

Even though linear models are inherently interpretable, SHAP provides:

- **Consistent methodology** across different model types

- **Better visualization** tools

- **Local explanations** for individual predictions

- **Feature importance** that accounts for data distribution

SHAP makes the already interpretable linear models even more transparent and user-friendly for explaining predictions.

-- Living Mobile --

Sunday, November 2, 2025

What is SHAP? How it can be used for Linear Regression?

No comments:

Post a Comment

Followers

Blog Archive

About Me