Hi SAS experts,
I'm trying to do a number of things, I've perused the web and haven't been able to find a solutions.
I have 16 variables, 2 of them are dummy variables. One of them has 4 levels, the other has 7 levels. I have turned them into categorical variables by putting letters in from of them (otherwise they are treated as numerical continuous variables), i.e. "trt" has (t0, t1, t2, t3) and "dummy" has (e1, e2, e3, e4, e5, e6, e7). Maybe there's a way to create categorical variables without doing this but this seemed to do it alright.
I would like to do a stepwise model selection (including the two dummy variables). I would also like to do multiple linear regression after the variables have been selected, which should (if the dummy variables are included) include parameter estimates for the either 4 or 7 dummy levels. The code I use is below. One of the issues is that PROC GLM doesn't seem to do model selection when there is a CLASS dummy variable. It also doesn't produce parameter estimates for each of the 4 levels for "trt" dummy. Please advise.
proc glm data=adapt;
class=trt;
model sex age educ ave_ppl date_live ave_ha productive_land ave_sub wealth ejido_org group_partx market_distance dummy trt pes_partcipx info_loc ave_info ave_know exp_disaster cc_percep climate_change health groups/solution;
Run;
As far as I know, GLM doesn't do any model selection at all. You'd need something like PROC REG with one of the stepwise options (but really, that's a poor solution). You might want to use PROC PLS if you have multicollinearity, it will do a better job in the presence of multicollinearity.
It also doesn't produce parameter estimates for each of the 4 levels for "trt" dummy.
Meaning what? It produces parameter estimates for three of the levels and the fourth level is forced to zero? This is the way SAS does things, it has chosen this particular parameterization of the model, and while there are other parameterizations you can force, they all result in the exact same model. See: http://documentation.sas.com/?cdcId=statcdc&cdcVersion=14.2&docsetId=statug&docsetTarget=statug_glm_...
I've already conducted a PROC CORR procedure to make sure that there isn't multiple collinearity. They all are correlated at <0.3.
PROC REG doesn't allow me to include dummy variables since it doesn't allow a CLASS statement. I also can't just include it in the model statement without making back into a numerical variable (and hence continuous... which is not what I want). Suggestions? (Also for some reason no PROC GLM isn't allowing the class statement).
As for the question of Parameter Estimates It doesn't produce 3 levels either. It just produces the parameter estimates for the other variables (e.g. sex, age, etc).
@xshinbrot0 wrote:
PROC REG doesn't allow me to include dummy variables since it doesn't allow a CLASS statement.
You can create your own dummy coded variables if necessary.
@xshinbrot0 wrote:
Also for some reason no PROC GLM isn't allowing the class statement
As for the question of Parameter Estimates It doesn't produce 3 levels either. It just produces the parameter estimates for the other variables (e.g. sex, age, etc).
You're likely doing something wrong then. You need to post the code and log for us to be able to answer anything else related to this.
@xshinbrot0 wrote:
PROC REG doesn't allow me to include dummy variables since it doesn't allow a CLASS statement. I also can't just include it in the model statement without making back into a numerical variable (and hence continuous... which is not what I want).
Of course you can include binary dummy variables in PROC REG. You have to create the dummy variables yourself. It really doesn't matter if the dummy variables are continuous, it fits the same model as if they were class.
As for the question of Parameter Estimates It doesn't produce 3 levels either. It just produces the parameter estimates for the other variables (e.g. sex, age, etc).
I don't understand this at all. Show us what you mean.
PROC GLMSELECT?
If you make dummy variables before submitting to proc reg, as per @PaigeMiller's suggestion, take note of the GROUPNAME parameter in the MODEL statement. If will allow you to require stepwise selection to treat all the dummies for a given factor as a group - assuming that's what you want.
So reading the above, there are many ways to include dummy variables into a stepwise model selection procedure. I still argue against Stepwise as being a very poor solution to any real-world problem. I gave a link above to some of the criticisms of Stepwise.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.