I am using AIC to select the best model in Proc Reg, but when I run the best model in Proc Reg the parameter estimates do not match estimates from the output for the AIC selected model.
Here is the code:
proc reg data=watfront.clean_covar_trans outest=est_TransADI_national ;
model TransADI_national =
transpopdensity transhouseincome transartificialshore_p transBF_count
transFlood_P transImperv_P transImperv_PC transParkdist transPM10removed_P transRipariantree_P transTreecover_P transTreeview_P transwalkability transOSMhybrid_D transOSMstream_D
/ selection=adjrsq sse aic ; run; quit;
proc sort data = est_TransADI_national ; by _aic_;
data est_TransADI_national; set est_TransADI_national; if _N_ LE 1 then output;run;
Proc export data = est_TransADI_national
outfile = 'L:\lab\GIS\epa\R3\SHC 9.3.1\SAS\local backup\reg_est\est_TransADI_national.xlsx'
DBMS = xlsx replace;run;
proc reg data=watfront.clean_covar_trans ;
model TransADI_national = transhouseincome transimperv_p transimperv_pc
transpm10removed_p transtreecover_p transtreeview_p transosmhybrid_d
/ stb;run;
here is the output for the best model from AIC
_MODEL_ | MODEL1 |
_TYPE_ | PARMS |
_DEPVAR_ | TransADI_national |
_RMSE_ | 0.145552081 |
Intercept | 5.424183829 |
transhouseincome | -0.98637025 |
transimperv_p | -0.172107811 |
transimperv_pc | 0.165121251 |
transpm10removed_p | -7.681898647 |
transtreecover_p | 0.627428135 |
transtreeview_p | -0.443097223 |
transosmhybrid_d | -0.04531394 |
TransADI_national | -1 |
_IN_ | 7 |
_P_ | 8 |
_EDF_ | 146 |
_SSE_ | 3.093069598 |
_RSQ_ | 0.805281478 |
_AIC_ | -585.7994455 |
Here is the output for the model selected using the AIC criteria:
Note there are missing values for some predictors.
The parameter estimates and other statistics do not match and I don't know why. Thanks!
You are using two different models but you expect the same regression coefficients?
For example, the first model statement has variable transBF_count, but the second model statement does not have this variable. I would not expect the coefficients to match unless the models are identical.
Perhaps this is indeed caused by missing data.
Missing Values PROC REG constructs only one crossproducts matrix for the variables in all regressions. If any variable needed for any regressionis missing, the observation is excluded from all estimates. If you include variables with missing values in the VAR statement, the corresponding observations are excluded from all analyses, even if you never include the variables in a model. PROC REG assumes that you might want to include these variables after the first RUN statement and deletes observations with missing values.
Hello,
>> to get the standardized coefficients
Can't you use the STB option on the model statement (after the forward slash / )?
produces standardized regression coefficients.
A standardized regression coefficient is computed by dividing a parameter estimate by the ratio of the sample standard deviation of the dependent variable to the sample standard deviation of the regressor.
Koen
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.