Hello,
I have tried to estimate the best model using the following commands.
Then I ranking the model by minimum AIC. The result is shown below.
Based on my understanding, the numbers I highlighted in yellow are the parameters estimated of the model with minumum AIC, where the independent variables that shown in dot "." were removed.
I need to test the significant of each coefficients by using following commands.
The coefficient estimates from this command (shown below) is different of those highlighted in yellow. Should these numbers highlighted in green be the same as those numbers highlighted in yellow?
Thank You
Are there missing values in any of these variables? That may influence how the estimates are calculated under METHOD=ADJRSQ.
Are there missing values in any of these variables? That may influence how the estimates are calculated under METHOD=ADJRSQ.
Hi all,
I am using the following syntax to generate a series of linear regression models using several independent variables. The syntax creates models using all potential combinations of the independent variables. Some of the statistics (like RMSE, SSE, AIC, RSq, AdjRsq) are stored from the parameter estimates table. However, the model number seems to be overridden - it always shows as Model1. Additionally, VIF is shown only on the model that has the best AdjRsq.
I want to be able to generate models using all possible combinations of the independent variables, and also run other diagnostic tests (like heteroscedasticity, normality of residuals etc.) on all the generated models and not on just the best model that the code picks up according to the selection. Any help is thoroughly appreciated. Here is my syntax:
Proc Reg Data = Temp_Master OUTEST = temp_model RSQUARE;
Model Dep1 = IV1 IV2 IV2 / Selection = AdjRsq RSQUARE AIC SSE VIF;
OUTPUT OUT = model_temp r = Residuals_model_temp;
RUN;
A syntax that I am using for testing one of the diagnostic tests mentioned above (normality of residuals) is shown below:
Proc Univariate Data = model_temp; var residuals_model_temp; run;
Before you go down this road, consider whether you really want to find a "best" model. There are tons of posts in this forum that point out the problems with any of the selection methods and criteria. Since model building is an art, you should consider the physical/biological/psychological processes involved and use that prior knowledge to form hypotheses about the relationship you are testing. Should you merely want the best predictor, use all the variables and their interactions and do something like a classification and regression tree approach. Pure regression "ought" to have something to do with an assumed causality, otherwise you may as well include number of sunspots observed as a predictor.
SteveDenham
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.