BookmarkSubscribeRSS Feed
xshinbrot0
Calcite | Level 5

Hi SAS experts,

 

I'm trying to do a number of things, I've perused the web and haven't been able to find a solutions.

 

I have 16 variables, 2 of them are dummy variables. One of them has 4 levels, the other has 7 levels. I have turned them into categorical variables by putting letters in from of them (otherwise they are treated as numerical continuous variables), i.e. "trt" has (t0, t1, t2, t3) and "dummy" has (e1, e2, e3, e4, e5, e6, e7). Maybe there's a way to create categorical variables without doing this but this seemed to do it alright.

 

I would like to do a stepwise model selection (including the two dummy variables). I would also like to do multiple linear regression after the variables have been selected, which should (if the dummy variables are included) include parameter estimates for the either 4 or 7 dummy levels. The code I use is below. One of the issues is that PROC GLM doesn't seem to do model selection when there is a CLASS dummy variable. It also doesn't produce parameter estimates for each of the 4 levels for "trt" dummy. Please advise.

proc glm data=adapt;
class=trt;
model sex age educ ave_ppl date_live ave_ha productive_land ave_sub wealth ejido_org group_partx market_distance dummy trt pes_partcipx info_loc ave_info ave_know exp_disaster cc_percep climate_change health groups/solution;
Run;

 

7 REPLIES 7
PaigeMiller
Diamond | Level 26

As far as I know, GLM doesn't do any model selection at all. You'd need something like PROC REG with one of the stepwise options (but really, that's a poor solution). You might want to use PROC PLS if you have multicollinearity, it will do a better job in the presence of multicollinearity.

 

It also doesn't produce parameter estimates for each of the 4 levels for "trt" dummy.

Meaning what? It produces parameter estimates for three of the levels and the fourth level is forced to zero? This is the way SAS does things, it has chosen this particular parameterization of the model, and while there are other parameterizations you can force, they all result in the exact same model. See: http://documentation.sas.com/?cdcId=statcdc&cdcVersion=14.2&docsetId=statug&docsetTarget=statug_glm_...

--
Paige Miller
xshinbrot0
Calcite | Level 5

I've already conducted a PROC CORR procedure to make sure that there isn't multiple collinearity. They all are correlated at <0.3.

 

PROC REG doesn't allow me to include dummy variables since it doesn't allow a CLASS statement. I also can't just include it in the model statement without making back into a numerical variable (and hence continuous... which is not what I want). Suggestions? (Also for some reason no PROC GLM isn't allowing the class statement).

 

As for the question of Parameter Estimates It doesn't produce 3 levels either. It just produces the parameter estimates for the other variables (e.g. sex, age, etc).

 

 

Reeza
Super User

@xshinbrot0 wrote:

 

PROC REG doesn't allow me to include dummy variables since it doesn't allow a CLASS statement.

 


You can create your own dummy coded variables if necessary.

 


@xshinbrot0 wrote:

Also for some reason no PROC GLM isn't allowing the class statement

 

As for the question of Parameter Estimates It doesn't produce 3 levels either. It just produces the parameter estimates for the other variables (e.g. sex, age, etc).


You're likely doing something wrong then. You need to post the code and log for us to be able to answer anything else related to this.

PaigeMiller
Diamond | Level 26

@xshinbrot0 wrote:

PROC REG doesn't allow me to include dummy variables since it doesn't allow a CLASS statement. I also can't just include it in the model statement without making back into a numerical variable (and hence continuous... which is not what I want).

 

 


Of course you can include binary dummy variables in PROC REG. You have to create the dummy variables yourself. It really doesn't matter if the dummy variables are continuous, it fits the same model as if they were class.

 

As for the question of Parameter Estimates It doesn't produce 3 levels either. It just produces the parameter estimates for the other variables (e.g. sex, age, etc).

 

I don't understand this at all. Show us what you mean.

--
Paige Miller
mkeintz
PROC Star

If you make dummy variables before submitting to proc reg, as per @PaigeMiller's suggestion, take note of the GROUPNAME parameter in the MODEL statement.  If will allow you to require stepwise selection to treat all the dummies for a given factor as a group - assuming that's what you want.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

So reading the above, there are many ways to include dummy variables into a stepwise model selection procedure. I still argue against Stepwise as being a very poor solution to any real-world problem. I gave a link above to some of the criticisms of Stepwise.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2708 views
  • 0 likes
  • 4 in conversation