Sunday, November 2, 2025

How statistics can be used for linear regression?

 **True**

---

## **Explanation**

In linear regression, we often use **hypothesis tests on coefficients** to decide whether to keep or drop variables.

### **Typical Procedure:**

1. **Set up hypotheses** for each predictor \( X_j \):

   - \( H_0: \beta_j = 0 \) (variable has no effect)

   - \( H_1: \beta_j \neq 0 \) (variable has a significant effect)


2. **Compute t-statistic**:

   \[

   t = \frac{\hat{\beta}_j}{\text{SE}(\hat{\beta}_j)}

   \]

   where \( \text{SE}(\hat{\beta}_j) \) is the standard error of the coefficient.


3. **Compare to critical value** or use **p-value**:

   - If p-value < significance level (e.g., 0.05), reject \( H_0 \) → **keep** the variable

   - If p-value ≥ significance level, fail to reject \( H_0 \) → consider **dropping** the variable


---


### **Example:**

In regression output:

```

            Coefficient   Std Error   t-stat   p-value

Intercept   2.5          0.3         8.33     <0.001

X1          0.8          0.4         2.00     0.046

X2          0.1          0.5         0.20     0.842

```

- **X1** (p = 0.046): Significant at α=0.05 → **keep**

- **X2** (p = 0.842): Not significant → consider **dropping**


---


### **Note:**

While this is common practice, variable selection shouldn't rely **only** on p-values — domain knowledge, model purpose, and multicollinearity should also be considered. But the statement itself is **true**: hypothesis testing on coefficients is indeed used for deciding whether to keep/drop variables.

No comments:

Post a Comment