Hello, I am working on variable selection using a purposeful modeling strategy (rather than stepwise) and could use some guidance on what Proc statement would best fit my dataset to produce accurate estimates. In addition, I am using an effect modifier in my dataset and am adding coviarites to the model 1-by-1 to see if the addition of a new covariate is a better model fit (using the AIC or -2 log likelihood value) Here is a little information about my data. Predictor (y): count data (cannot take on the value of non-negative integers) Main exposure: binary Effect modifier: Categorical 8 potential covariates (all categorical) 1) s the Proc HPGENSELECT the appropriate procedure if I only have around 350 observations? If so, should I use the Proc GENMOD procedure for variable selection instead? 2) If I can use the HPGENSELECT procedure, do I need to specify dist= poisson and link= identity to produce more accurate estimates? I ran 2 different models to see how the AIC score would change, and they were drastically different when I specified that the distribution is poisson. Model 1: AIC = 2778.45 Proc hpgenselect data=work.example ; class X_variable Effect_modifier ; model Y_variable = X_variable Effect_modifier X_Variable*Effect_modifier / cl ; run; Model 2: AIC = 4650.43 Proc hpgenselect data=work.example ; class X_variable Effect_modifier ; model Y_variable = X_variable Effect_modifier X_Variable*Effect_modifier / cl dist= poisson link= identity ; run; I would like to note that predictor is non normally distributed (skewed right) but homoscedasticity and linearity are not violated. 3) Lastly, originally I specified the X_variable with a reference option (ref = XXX), but the estimates did not seem correct. Would it be more appropriate to leave the default option for class parameterization as GLM? Thank you
... View more