Forecasting using SAS Forecast Server, SAS/ETS, and more

How SAS calculates regression with dummy variables?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 136
Accepted Solution

How SAS calculates regression with dummy variables?

[ Edited ]

Hello, everybody.
I want to regress dummy variables, which are time-based, on volume and use PROC GENMOD and PROC GLM statements to create dummies automatically.
In addition, I use DATA statement to create dummies manually. I have seven dummies which are classified as below:

 

Dummy_1: 9:00 << Time < 9:30;

Dummy_2: 9:30 << Time < 10:00;

Dummy_3: 10:00 << Time < 10:30;

Dummy_4: 10:30 << Time < 11;

Dummy_5: 11:00 << Time < 11:30;

Dummy_6: 11:30 << Time < 12;

Dummy_7: 12 << Time < 12:30;

 

Here are some examples of my codes:

* Regressing dummy variables on normalized volume variable using calculated volume;
proc genmod data=Sampledata_adjvol;
   class TRD_EVENT_ROUFOR / param=effect;
   model adjusted_volume = TRD_EVENT_ROUFOR / noscale;
   ods select ParameterEstimates;
run;

* Same analysis by using the CLASS statement;
proc glm data=Sampledata_adjvol;
   class TRD_EVENT_ROUFOR;              /* Generates dummy variables internally */
   model adjusted_volume = TRD_EVENT_ROUFOR / solution;
   ods select ParameterEstimates;
quit;
* Creating dummy variables manually;
data Sampledata_adjvol_DumVar;
  set Sampledata_adjvol ;
  	if TRD_EVENT_ROUNDED = 34200 then TRD_EVENT_ROUNDED_1 = 1; 
    else TRD_EVENT_ROUNDED_1 = 0;
  	if TRD_EVENT_ROUNDED = 36000 then TRD_EVENT_ROUNDED_2 = 1; 
    else TRD_EVENT_ROUNDED_2 = 0;
  	if TRD_EVENT_ROUNDED = 37800 then TRD_EVENT_ROUNDED_3 = 1; 
    else TRD_EVENT_ROUNDED_3 = 0;
 	if TRD_EVENT_ROUNDED = 39600 then TRD_EVENT_ROUNDED_4 = 1; 
    else TRD_EVENT_ROUNDED_4 = 0;
	if TRD_EVENT_ROUNDED = 41400 then TRD_EVENT_ROUNDED_5 = 1; 
    else TRD_EVENT_ROUNDED_5 = 0;
	if TRD_EVENT_ROUNDED = 43200 then TRD_EVENT_ROUNDED_6 = 1; 
    else TRD_EVENT_ROUNDED_6 = 0;
	if TRD_EVENT_ROUNDED = 45000 then TRD_EVENT_ROUNDED_7 = 1; 
    else TRD_EVENT_ROUNDED_7 = 0;
run;
proc freq data=Sampledata_adjvol_DumVar;
  tables TRD_EVENT_ROUNDED*TRD_EVENT_ROUNDED_1*TRD_EVENT_ROUNDED_2*TRD_EVENT_ROUNDED_3*TRD_EVENT_ROUNDED_4*TRD_EVENT_ROUNDED_5*TRD_EVENT_ROUNDED_6*TRD_EVENT_ROUNDED_7 / list ;
run;

* Regressing dummy variables on normalized volume variable using calculated volume;
 ods graphics on;
proc reg data = Sampledata_adjvol_DumVar plots(maxpoints = none);
	model adjusted_volume = TRD_EVENT_ROUNDED_1 TRD_EVENT_ROUNDED_2 TRD_EVENT_ROUNDED_3 TRD_EVENT_ROUNDED_4 TRD_EVENT_ROUNDED_5 TRD_EVENT_ROUNDED_6 TRD_EVENT_ROUNDED_7;
run;
 ods graphics off;

 

The results are attached to this post.

 

Why the final dummy is not estimated?

What is the problem?

How can I fix that?

 

Thanks in advance.


Accepted Solutions
Solution
‎06-16-2017 03:21 PM
PROC Star
Posts: 307

Re: How SAS calculates regression with dummy variables?

Posted in reply to aminkarimid

Look up "dummy variable trap." With dummy variables, you need the number of levels - 1 in the model. The one left out is the base against which the other paremeters are based.  At the most basic level, with two levels you need one dummy and the coefficient is the the effect of being one compared to whatever is zero.

View solution in original post


All Replies
Solution
‎06-16-2017 03:21 PM
PROC Star
Posts: 307

Re: How SAS calculates regression with dummy variables?

Posted in reply to aminkarimid

Look up "dummy variable trap." With dummy variables, you need the number of levels - 1 in the model. The one left out is the base against which the other paremeters are based.  At the most basic level, with two levels you need one dummy and the coefficient is the the effect of being one compared to whatever is zero.

Trusted Advisor
Posts: 1,913

Re: How SAS calculates regression with dummy variables?

[ Edited ]
Posted in reply to aminkarimid

As explained above, if you have N levels, you can only estimate n-1 coefficients plus the intercept. If you leave the intercept out of the model, then you can estimate all N levels. This is basic math.

 

 

Also, you keep writing something like this, in this and other threads

 

First half an hour: 9:00 << Dummy_1 < 9:30;

 

 

which makes absolutely no sense at all, dummy_1 is either 0 or 1 (otherwise it's not a dummy variable), and a variable that has values of 0 or 1 cannot be between 9:00 and 9:30. You most likely mean

 

dummy1 = 9:00 <= time_1 < 9:30;

 (which might not be correct syntax, but you get the idea)

 

 

so I would hope that you will write more meaningful and understandable math and SAS code in the future.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 159 views
  • 0 likes
  • 3 in conversation