06-10-2013 03:58 PM
I am in the process of selecting subset of variables for the logistic regression model I am building. I have 14 variables - 4 of them are binary categorical variables and one 3-level categorical variables. I have used stepwise selection earlier for choosing models.
I want to try the "score" method and use the combination of variables on the Test data in order to calculate the out of sample F-score (OOS F-score) for each variable combination. I want to see the lift I get in the OOS F-score after adding additional variables.
Unfortunately, I found out the "score" method doesn't handle class variables. Can I create Indicator variables for each level of the categorical variable and run the "Score" method?
I have trouble understanding the output of this, do we consider the categorical variable to be a part of the subset if one of its levels is chosen?
For ex. X1,X2, X3, X4 are the variables being considered. X4 is a categorical variable with 3 levels so I make 3 indicator variables (X4_I1, X4_I2 and X4_I3)
If from the "score"method, Variables included in Model are X1, X2 and X4_I1, do we consider X4 to be a signifcant variable since one of its levels is chosen?
Lastly, since i have 14 variable, shouldn't the combination of variables be 14C1 + 14C2 + .. +14C13 + 14? I only get 309 models after running "score" method. Is there a reason why only a subset of all possible combinations are shown in the output?
I haven't selected BEST, START, or STOP option.
Thank you in advance for your help,