BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TedAngradi
Fluorite | Level 6

I am using AIC to select the best model in Proc Reg, but  when I run the best model in Proc Reg the parameter estimates do not match estimates from the output for the AIC selected model. 

Here is the code:
proc reg data=watfront.clean_covar_trans outest=est_TransADI_national ;
model TransADI_national =
transpopdensity transhouseincome transartificialshore_p transBF_count
transFlood_P transImperv_P transImperv_PC transParkdist transPM10removed_P transRipariantree_P transTreecover_P transTreeview_P transwalkability transOSMhybrid_D transOSMstream_D

/ selection=adjrsq sse aic ; run; quit;
proc sort data = est_TransADI_national ; by _aic_;
data est_TransADI_national; set est_TransADI_national; if _N_ LE 1 then output;run;
Proc export data = est_TransADI_national
outfile = 'L:\lab\GIS\epa\R3\SHC 9.3.1\SAS\local backup\reg_est\est_TransADI_national.xlsx'
DBMS = xlsx replace;run;

 

proc reg data=watfront.clean_covar_trans ;
model TransADI_national = transhouseincome transimperv_p transimperv_pc
transpm10removed_p transtreecover_p transtreeview_p transosmhybrid_d
/ stb;run;

here is the output for the best model from AIC

_MODEL_MODEL1
_TYPE_PARMS
_DEPVAR_TransADI_national
_RMSE_0.145552081
Intercept5.424183829
transhouseincome-0.98637025
transimperv_p-0.172107811
transimperv_pc0.165121251
transpm10removed_p-7.681898647
transtreecover_p0.627428135
transtreeview_p-0.443097223
transosmhybrid_d-0.04531394
TransADI_national-1
_IN_7
_P_8
_EDF_146
_SSE_3.093069598
_RSQ_0.805281478
_AIC_-585.7994455

 

Here is the output for the model selected using the AIC criteria:

 

 

TedAngradi_0-1631541208086.png

Note there are missing values for some predictors.

 

The parameter estimates and other statistics do not match and I don't know why. Thanks!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
TedAngradi
Fluorite | Level 6
Thank you. Yes there is apparently a difference in how the full fit Proc Reg and the best fit model from the model selection based on min AIC deals with missing predictor values resulting in different error DF and different parameter estimates for the same model (because the dataset is different). I think I will use AIC to select a model and then rerun the best fit model to get the standard coefficients.

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

You are using two different models but you expect the same regression coefficients?

 

For example, the first model statement has variable transBF_count, but the second model statement does not have this variable. I would not expect the coefficients to match unless the models are identical.

 

Perhaps this is indeed caused by missing data.

 

 

--
Paige Miller
data_null__
Jade | Level 19

Missing Values PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regressionis missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model.  PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations with missing values.

TedAngradi
Fluorite | Level 6
Thanks! The reason I am running the second model is to get the standardized coefficients for the so called best model from AIC. I seem to lose DF with the AIC selected model which is probably why the parameters are different (?). Can you think of any remedy I might try?
sbxkoenk
SAS Super FREQ

Hello,

 

>> to get the standardized coefficients 

Can't you use the STB option on the model statement (after the forward slash / )?

STB

produces standardized regression coefficients.
A standardized regression coefficient is computed by dividing a parameter estimate by the ratio of the sample standard deviation of the dependent variable to the sample standard deviation of the regressor.

 

Koen

Reeza
Super User
Look at the first table in your output from each PROC - specifically compare the Number of Observations Used. If they are not the same, you should not expect the same parameter estimates.
TedAngradi
Fluorite | Level 6
Thank you. Yes there is apparently a difference in how the full fit Proc Reg and the best fit model from the model selection based on min AIC deals with missing predictor values resulting in different error DF and different parameter estimates for the same model (because the dataset is different). I think I will use AIC to select a model and then rerun the best fit model to get the standard coefficients.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 801 views
  • 4 likes
  • 5 in conversation