Hi all,
I am running a logistic stepwise to produce a model of predictive independent variables to some response variable. At the moment I am currently running the below code on a dataset which in turns produces some model.
proc logistic data=inputdata outset=ouptutdata covout; /* Parameter estimates and and their covariances for the final selected model */
model default_12mnd (event='1')=&faktorlist.
/ selection=stepwise
slentry=0.3 /* A significance level of 0.3 is required to allow a variable into the model */
slstay=0.35 /* A significance level of 0.35 is required for a variable to stay in the model */
details
lackfit; /* A Hosmer and Lemeshow goodness-of-fit test for the final selected model */
output out=pred p=phat lower=lcl upper=ucl /* The output contains the cumulative predicted probabilities and the corresponding confidence limits, and the individual and cross validated predicted probabilities for each observation */
predprob=(individual crossvalidate);
run;
My question is assuming the data can be grouped then is there a way to select/force say, a maximum of 2 variables from grouping 1, max 3 variables from grouping 2 etc? Ultimately I would like to try and create a more 'balanced' model as at the moment most variables that end up in the model tend to be from one particular grouping. I understand this this will result in a less accurate model but ultimately would like it to be more practical.
Thank you in advance
LOGISTIC model is GLM . Are not able to achieve your intention .
Why not make a logistic model for each and every group value ? and compare these model .
There is no option in the stepwise methods of PROC LOGISTIC to do this type of grouping. It would have to be done manually somehow, by you adding/removing certain variables from the model, and running the model again.
Thanks for the responses.
Perhaps a stepwise isn't the correct approach for me to use? Suggestions welcome
Let's say you have a bunch of variables with names beginning with A, a bunch beginning with B, and so on. I assume that what you want to do is to do selection with the A set and separately within the B set, and so on. If correct, that cannot be done at one shot, but you could do it in a separate PROC LOGISTIC step for each set. In the first step for the A set, you would list all of your variables in the MODEL statement so that all of the A variables are last. You would specify the INCLUDE= option, specifying the number of variables preceding the set of A variables to force all of them to stay in the model. Then use the SELECTION= option (and whatever other options you want such as STOP=) to do the selection only within the A set. Once you select the variables to keep in the A set, you can run the next PROC LOGISTIC step with the selected A variables followed by all of the variables except the B set which again will be put at the very end. Update the INCLUDE= value to include all variables except the B set and do the same selection options to select within the B set. And continue for all sets. Note that there is no guarantee that the final set of selected variables will be the same if you do this in the order A, B, C, ... vs another order like C, A, B, ... but model selection methods themselves are heuristic and so are not guaranteed to find the optimal model. So, in that sense, this doesn't make things any worse.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.