BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ilikesas
Barite | Level 11

Hi,

 

when regressing data that contains dummy variables we omit one of the dummies and then the coefficients that SAS outputs for the dummies is the difference of the effect on the dependent variable of that given dummy less the ommitted dummy, this is straighforward.

 

But what should be done when there are 2 tyes of dummies: suppose that there are dummies A1-A4 and B1-B4. The A and B categories are independent of each other, so I wan to omit A4 in order to study the effect of A1-A3 compared to A4, and to ommit B4 in order to study the effect of B1-B3 compared to B4. But when I ommit A4 and B4, how does SAS know (or how is it possible to make it know) that A4 is related only to A1-A3 and B4 only to B1-B3. 

 

Just a little more ilustration, suppose I have dummies New York, Chicago, Los Angeles and dummies Summer, Winter, Fall, Spring - If I ommit Spring and Los Angeles, it is nonsensical to compare say Chicago with Spring and Fall with Los Aangeles

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to.  The act of omitting the k+th dummy variable  avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.

 

I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid  dummy variables and use the CLASS statement, which is easier to interpret.

View solution in original post

6 REPLIES 6
user24feb
Barite | Level 11

What you should do depends on what you would like to do. You only have to make sure that you leave out either the intercept or one dummy variable. One approach would be to keep the intercept, define A1=1 as base scenario. Then you would have: Intercept=1; leave out A1=spring(?); A2=1 if summer, 0 otherwise;A3=1 if autumn, 0 otherwise; A4=1 if winter, 0 otherwise ; B1=1 if NY, 0 otherwise, B2=1 if Chicago, 0 otherwise; ..

Your model for proc reg (y=dep. var) would be: y = A2--B4 .. ;

Rick_SAS
SAS Super FREQ

You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to.  The act of omitting the k+th dummy variable  avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.

 

I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid  dummy variables and use the CLASS statement, which is easier to interpret.

ilikesas
Barite | Level 11

Hi Rick,

 

Glad to know that you have another blog! (I subscribed to it as well).

 

Just a small question,

 

In the example the you use, suppose that I also want to include the continous variables height and weight (in addition to the other 2 categorical dummy types). In such a case would I just have to add these variables into the model in the following way:

 

/* same analysis by using the CLASS statement */
proc glm data=Patients;
   class sex BP_Status;              /* generates dummy variables internally */
   model Cholesterol = Sex BP_Status HEIGHT WEIGHT / solution;
   ods select ParameterEstimates;
quit;

Thanks!

Rick_SAS
SAS Super FREQ

That is correct. Just add the continuous variables to the MODEL statement.

 

I am confused by your statement about "another blog." I only write one blog, and it is located at http://blogs.sas.com/content/iml

 

ilikesas
Barite | Level 11

Silly me, I didn't realize it was the old DO LOOP blog but with a new appearance!

 

And thatnks for the answer!

ilikesas
Barite | Level 11

Hi Rick,

 

I posted a new question https://communities.sas.com/t5/SAS-Statistical-Procedures/Tip-Fixed-vs-Random-Effects-in-Panel-Data/...

 

it looks very similar to the question that you answered here, but could you please take a look at it, since I am not sure.

Thanks!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2225 views
  • 2 likes
  • 3 in conversation