Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- regression with 2 dummy types

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 03-19-2017 11:59 PM
(2409 views)

Hi,

when regressing data that contains dummy variables we omit one of the dummies and then the coefficients that SAS outputs for the dummies is the difference of the effect on the dependent variable of that given dummy less the ommitted dummy, this is straighforward.

But what should be done when there are 2 tyes of dummies: suppose that there are dummies A1-A4 and B1-B4. The A and B categories are independent of each other, so I wan to omit A4 in order to study the effect of A1-A3 compared to A4, and to ommit B4 in order to study the effect of B1-B3 compared to B4. But when I ommit A4 and B4, how does SAS know (or how is it possible to make it know) that A4 is related only to A1-A3 and B4 only to B1-B3.

Just a little more ilustration, suppose I have dummies New York, Chicago, Los Angeles and dummies Summer, Winter, Fall, Spring - If I ommit Spring and Los Angeles, it is nonsensical to compare say Chicago with Spring and Fall with Los Aangeles

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to. The act of omitting the k+th dummy variable avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.

I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid dummy variables and use the CLASS statement, which is easier to interpret.

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What you should do depends on what you would like to do. You only have to make sure that you leave out either the intercept or one dummy variable. One approach would be to keep the intercept, define A1=1 as base scenario. Then you would have: Intercept=1; leave out A1=spring(?); A2=1 if summer, 0 otherwise;A3=1 if autumn, 0 otherwise; A4=1 if winter, 0 otherwise ; B1=1 if NY, 0 otherwise, B2=1 if Chicago, 0 otherwise; ..

Your model for proc reg (y=dep. var) would be: y = A2--B4 .. ;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to. The act of omitting the k+th dummy variable avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.

I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid dummy variables and use the CLASS statement, which is easier to interpret.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

Glad to know that you have another blog! (I subscribed to it as well).

Just a small question,

In the example the you use, suppose that I also want to include the continous variables height and weight (in addition to the other 2 categorical dummy types). In such a case would I just have to add these variables into the model in the following way:

```
/* same analysis by using the CLASS statement */
proc glm data=Patients;
class sex BP_Status; /* generates dummy variables internally */
model Cholesterol = Sex BP_Status HEIGHT WEIGHT / solution;
ods select ParameterEstimates;
quit;
```

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That is correct. Just add the continuous variables to the MODEL statement.

I am confused by your statement about "another blog." I only write one blog, and it is located at http://blogs.sas.com/content/iml

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Silly me, I didn't realize it was the old DO LOOP blog but with a new appearance!

And thatnks for the answer!

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Rick,

I posted a new question https://communities.sas.com/t5/SAS-Statistical-Procedures/Tip-Fixed-vs-Random-Effects-in-Panel-Data/...

it looks very similar to the question that you answered here, but could you please take a look at it, since I am not sure.

Thanks!

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.