A researcher runs a Multiple Linear Regression in SPSS to predict first-year university GPA from (1) Hours Studied per week, (2) Lecture Attendance (%), and (3) SAT score (N = 150). They request Casewise Diagnostics and influence statistics.
Selected SPSS output:
Casewise Diagnostics (Dependent Variable: GPA)
- Case 47: Std. Residual = 3.25
- Case 112: Std. Residual = 1.10
Residuals Statistics
- Max Cook's Distance = 1.35
- Max Centered Leverage Value = 0.29
(Assume k = 3 predictors, and the analyst used the default SPSS cutoffs display.)
Which interpretation and next step is MOST appropriate regarding assumptions/influence?
The regression assumption of normality is violated because Case 47 has a standardized residual > 3, so the correct fix is to delete Case 47 and rerun the model.
Cook’s Distance indicates heteroscedasticity; therefore, the best next step is to log-transform GPA and rerun the regression.
Case 112 is likely an influential case (high influence/leveraging) despite a small residual; the best next step is to check for data errors and run a sensitivity analysis (compare results with and without that case) and report it.
The key issue is multicollinearity because leverage is high; the best fix is to mean-center all predictors to reduce leverage and make the assumptions hold.