Solved: proc reg with selection

Golf · Posted 03-15-2020 12:59 AM

Hello,

I have tried to estimate the best model using the following commands.

Then I ranking the model by minimum AIC. The result is shown below.

Based on my understanding, the numbers I highlighted in yellow are the parameters estimated of the model with minumum AIC, where the independent variables that shown in dot "." were removed.

I need to test the significant of each coefficients by using following commands.

The coefficient estimates from this command (shown below) is different of those highlighted in yellow. Should these numbers highlighted in green be the same as those numbers highlighted in yellow?

Thank You

PaigeMiller · Posted 03-15-2020 06:14 AM

Are there missing values in any of these variables? That may influence how the estimates are calculated under METHOD=ADJRSQ.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 03-15-2020 06:14 AM

Are there missing values in any of these variables? That may influence how the estimates are calculated under METHOD=ADJRSQ.

--
Paige Miller

sridhar3 · Posted 03-30-2020 11:28 PM

Hi all,

I am using the following syntax to generate a series of linear regression models using several independent variables. The syntax creates models using all potential combinations of the independent variables. Some of the statistics (like RMSE, SSE, AIC, RSq, AdjRsq) are stored from the parameter estimates table. However, the model number seems to be overridden - it always shows as Model1. Additionally, VIF is shown only on the model that has the best AdjRsq.

I want to be able to generate models using all possible combinations of the independent variables, and also run other diagnostic tests (like heteroscedasticity, normality of residuals etc.) on all the generated models and not on just the best model that the code picks up according to the selection. Any help is thoroughly appreciated. Here is my syntax:

Proc Reg Data = Temp_Master OUTEST = temp_model RSQUARE;

Model Dep1 = IV1 IV2 IV2 / Selection = AdjRsq RSQUARE AIC SSE VIF;

OUTPUT OUT = model_temp r = Residuals_model_temp;

RUN;

A syntax that I am using for testing one of the diagnostic tests mentioned above (normality of residuals) is shown below:

Proc Univariate Data = model_temp; var residuals_model_temp; run;

SteveDenham · Posted 03-31-2020 02:34 PM

Before you go down this road, consider whether you really want to find a "best" model. There are tons of posts in this forum that point out the problems with any of the selection methods and criteria. Since model building is an art, you should consider the physical/biological/psychological processes involved and use that prior knowledge to form hypotheses about the relationship you are testing. Should you merely want the best predictor, use all the variables and their interactions and do something like a classification and regression tree approach. Pure regression "ought" to have something to do with an assumed causality, otherwise you may as well include number of sunspots observed as a predictor.

SteveDenham

proc reg with selection

Re: proc reg with selection

Re: proc reg with selection

Re: proc reg with selection

Re: proc reg with selection

proc reg with selection

Re: proc reg with selection

Re: proc reg with selection

Re: proc reg with selection

Re: proc reg with selection

SAS Innovate 2025: Call for Content