turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Forecasting
- /
- How SAS calculates regression with dummy variables...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-16-2017 12:04 PM - edited 06-16-2017 01:17 PM

Hello, everybody.

I want to regress dummy variables, which are time-based, on volume and use PROC GENMOD and PROC GLM statements to create dummies automatically.

In addition, I use DATA statement to create dummies manually. I have seven dummies which are classified as below:

Dummy_1: 9:00 << Time < 9:30;

Dummy_2: 9:30 << Time < 10:00;

Dummy_3: 10:00 << Time < 10:30;

Dummy_4: 10:30 << Time < 11;

Dummy_5: 11:00 << Time < 11:30;

Dummy_6: 11:30 << Time < 12;

Dummy_7: 12 << Time < 12:30;

Here are some examples of my codes:

```
* Regressing dummy variables on normalized volume variable using calculated volume;
proc genmod data=Sampledata_adjvol;
class TRD_EVENT_ROUFOR / param=effect;
model adjusted_volume = TRD_EVENT_ROUFOR / noscale;
ods select ParameterEstimates;
run;
* Same analysis by using the CLASS statement;
proc glm data=Sampledata_adjvol;
class TRD_EVENT_ROUFOR; /* Generates dummy variables internally */
model adjusted_volume = TRD_EVENT_ROUFOR / solution;
ods select ParameterEstimates;
quit;
```

```
* Creating dummy variables manually;
data Sampledata_adjvol_DumVar;
set Sampledata_adjvol ;
if TRD_EVENT_ROUNDED = 34200 then TRD_EVENT_ROUNDED_1 = 1;
else TRD_EVENT_ROUNDED_1 = 0;
if TRD_EVENT_ROUNDED = 36000 then TRD_EVENT_ROUNDED_2 = 1;
else TRD_EVENT_ROUNDED_2 = 0;
if TRD_EVENT_ROUNDED = 37800 then TRD_EVENT_ROUNDED_3 = 1;
else TRD_EVENT_ROUNDED_3 = 0;
if TRD_EVENT_ROUNDED = 39600 then TRD_EVENT_ROUNDED_4 = 1;
else TRD_EVENT_ROUNDED_4 = 0;
if TRD_EVENT_ROUNDED = 41400 then TRD_EVENT_ROUNDED_5 = 1;
else TRD_EVENT_ROUNDED_5 = 0;
if TRD_EVENT_ROUNDED = 43200 then TRD_EVENT_ROUNDED_6 = 1;
else TRD_EVENT_ROUNDED_6 = 0;
if TRD_EVENT_ROUNDED = 45000 then TRD_EVENT_ROUNDED_7 = 1;
else TRD_EVENT_ROUNDED_7 = 0;
run;
proc freq data=Sampledata_adjvol_DumVar;
tables TRD_EVENT_ROUNDED*TRD_EVENT_ROUNDED_1*TRD_EVENT_ROUNDED_2*TRD_EVENT_ROUNDED_3*TRD_EVENT_ROUNDED_4*TRD_EVENT_ROUNDED_5*TRD_EVENT_ROUNDED_6*TRD_EVENT_ROUNDED_7 / list ;
run;
* Regressing dummy variables on normalized volume variable using calculated volume;
ods graphics on;
proc reg data = Sampledata_adjvol_DumVar plots(maxpoints = none);
model adjusted_volume = TRD_EVENT_ROUNDED_1 TRD_EVENT_ROUNDED_2 TRD_EVENT_ROUNDED_3 TRD_EVENT_ROUNDED_4 TRD_EVENT_ROUNDED_5 TRD_EVENT_ROUNDED_6 TRD_EVENT_ROUNDED_7;
run;
ods graphics off;
```

The results are attached to this post.

Why the final dummy is not estimated?

What is the problem?

How can I fix that?

Thanks in advance.

Accepted Solutions

Solution

06-16-2017
03:21 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aminkarimid

06-16-2017 12:12 PM

All Replies

Solution

06-16-2017
03:21 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aminkarimid

06-16-2017 12:12 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aminkarimid

06-16-2017 01:10 PM - edited 06-16-2017 01:21 PM

As explained above, if you have N levels, you can only estimate n-1 coefficients plus the intercept. If you leave the intercept out of the model, then you can estimate all N levels. This is basic math.

Also, you keep writing something like this, in this and other threads

First half an hour: 9:00 << Dummy_1 < 9:30;

which makes absolutely no sense at all, dummy_1 is either 0 or 1 (otherwise it's not a dummy variable), and a variable that has values of 0 or 1 cannot be between 9:00 and 9:30. You most likely mean

dummy1 = 9:00 <= time_1 < 9:30;

(which might not be correct syntax, but you get the idea)

so I would hope that you will write more meaningful and understandable math and SAS code in the future.