turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- regression with 2 dummy types

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-19-2017 11:59 PM - edited 03-20-2017 12:08 AM

Hi,

when regressing data that contains dummy variables we omit one of the dummies and then the coefficients that SAS outputs for the dummies is the difference of the effect on the dependent variable of that given dummy less the ommitted dummy, this is straighforward.

But what should be done when there are 2 tyes of dummies: suppose that there are dummies A1-A4 and B1-B4. The A and B categories are independent of each other, so I wan to omit A4 in order to study the effect of A1-A3 compared to A4, and to ommit B4 in order to study the effect of B1-B3 compared to B4. But when I ommit A4 and B4, how does SAS know (or how is it possible to make it know) that A4 is related only to A1-A3 and B4 only to B1-B3.

Just a little more ilustration, suppose I have dummies New York, Chicago, Los Angeles and dummies Summer, Winter, Fall, Spring - If I ommit Spring and Los Angeles, it is nonsensical to compare say Chicago with Spring and Fall with Los Aangeles

Thanks!

Accepted Solutions

Solution

03-26-2017
08:41 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2017 05:48 AM

You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to. The act of omitting the k+th dummy variable avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.

I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid dummy variables and use the CLASS statement, which is easier to interpret.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2017 04:40 AM

What you should do depends on what you would like to do. You only have to make sure that you leave out either the intercept or one dummy variable. One approach would be to keep the intercept, define A1=1 as base scenario. Then you would have: Intercept=1; leave out A1=spring(?); A2=1 if summer, 0 otherwise;A3=1 if autumn, 0 otherwise; A4=1 if winter, 0 otherwise ; B1=1 if NY, 0 otherwise, B2=1 if Chicago, 0 otherwise; ..

Your model for proc reg (y=dep. var) would be: y = A2--B4 .. ;

Solution

03-26-2017
08:41 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-20-2017 05:48 AM

You should review how dummy variables represent levels of a categorical variable. Dummy variables merely indicate which of k categories each observation belongs to. The act of omitting the k+th dummy variable avoids creating a linear dependent variables because if an observation is not one of the first (k-1) levels, it must belong to the k_th.

I recommend that you use the ideas in the link above to let SAS generate the dummy variables for you. Or better yet, avoid dummy variables and use the CLASS statement, which is easier to interpret.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-25-2017 12:20 PM - edited 03-25-2017 12:21 PM

Hi Rick,

Glad to know that you have another blog! (I subscribed to it as well).

Just a small question,

In the example the you use, suppose that I also want to include the continous variables height and weight (in addition to the other 2 categorical dummy types). In such a case would I just have to add these variables into the model in the following way:

```
/* same analysis by using the CLASS statement */
proc glm data=Patients;
class sex BP_Status; /* generates dummy variables internally */
model Cholesterol = Sex BP_Status HEIGHT WEIGHT / solution;
ods select ParameterEstimates;
quit;
```

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-26-2017 06:20 AM

That is correct. Just add the continuous variables to the MODEL statement.

I am confused by your statement about "another blog." I only write one blog, and it is located at http://blogs.sas.com/content/iml

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-26-2017 08:41 AM

Silly me, I didn't realize it was the old DO LOOP blog but with a new appearance!

And thatnks for the answer!