BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aminkarimid
Lapis Lazuli | Level 10

Hello, everybody.
I want to regress dummy variables, which are time-based, on volume and use PROC GENMOD and PROC GLM statements to create dummies automatically.
In addition, I use DATA statement to create dummies manually. I have seven dummies which are classified as below:

 

Dummy_1: 9:00 << Time < 9:30;

Dummy_2: 9:30 << Time < 10:00;

Dummy_3: 10:00 << Time < 10:30;

Dummy_4: 10:30 << Time < 11;

Dummy_5: 11:00 << Time < 11:30;

Dummy_6: 11:30 << Time < 12;

Dummy_7: 12 << Time < 12:30;

 

Here are some examples of my codes:

* Regressing dummy variables on normalized volume variable using calculated volume;
proc genmod data=Sampledata_adjvol;
   class TRD_EVENT_ROUFOR / param=effect;
   model adjusted_volume = TRD_EVENT_ROUFOR / noscale;
   ods select ParameterEstimates;
run;

* Same analysis by using the CLASS statement;
proc glm data=Sampledata_adjvol;
   class TRD_EVENT_ROUFOR;              /* Generates dummy variables internally */
   model adjusted_volume = TRD_EVENT_ROUFOR / solution;
   ods select ParameterEstimates;
quit;
* Creating dummy variables manually;
data Sampledata_adjvol_DumVar;
  set Sampledata_adjvol ;
  	if TRD_EVENT_ROUNDED = 34200 then TRD_EVENT_ROUNDED_1 = 1; 
    else TRD_EVENT_ROUNDED_1 = 0;
  	if TRD_EVENT_ROUNDED = 36000 then TRD_EVENT_ROUNDED_2 = 1; 
    else TRD_EVENT_ROUNDED_2 = 0;
  	if TRD_EVENT_ROUNDED = 37800 then TRD_EVENT_ROUNDED_3 = 1; 
    else TRD_EVENT_ROUNDED_3 = 0;
 	if TRD_EVENT_ROUNDED = 39600 then TRD_EVENT_ROUNDED_4 = 1; 
    else TRD_EVENT_ROUNDED_4 = 0;
	if TRD_EVENT_ROUNDED = 41400 then TRD_EVENT_ROUNDED_5 = 1; 
    else TRD_EVENT_ROUNDED_5 = 0;
	if TRD_EVENT_ROUNDED = 43200 then TRD_EVENT_ROUNDED_6 = 1; 
    else TRD_EVENT_ROUNDED_6 = 0;
	if TRD_EVENT_ROUNDED = 45000 then TRD_EVENT_ROUNDED_7 = 1; 
    else TRD_EVENT_ROUNDED_7 = 0;
run;
proc freq data=Sampledata_adjvol_DumVar;
  tables TRD_EVENT_ROUNDED*TRD_EVENT_ROUNDED_1*TRD_EVENT_ROUNDED_2*TRD_EVENT_ROUNDED_3*TRD_EVENT_ROUNDED_4*TRD_EVENT_ROUNDED_5*TRD_EVENT_ROUNDED_6*TRD_EVENT_ROUNDED_7 / list ;
run;

* Regressing dummy variables on normalized volume variable using calculated volume;
 ods graphics on;
proc reg data = Sampledata_adjvol_DumVar plots(maxpoints = none);
	model adjusted_volume = TRD_EVENT_ROUNDED_1 TRD_EVENT_ROUNDED_2 TRD_EVENT_ROUNDED_3 TRD_EVENT_ROUNDED_4 TRD_EVENT_ROUNDED_5 TRD_EVENT_ROUNDED_6 TRD_EVENT_ROUNDED_7;
run;
 ods graphics off;

 

The results are attached to this post.

 

Why the final dummy is not estimated?

What is the problem?

How can I fix that?

 

Thanks in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
collinelliot
Barite | Level 11

Look up "dummy variable trap." With dummy variables, you need the number of levels - 1 in the model. The one left out is the base against which the other paremeters are based.  At the most basic level, with two levels you need one dummy and the coefficient is the the effect of being one compared to whatever is zero.

View solution in original post

2 REPLIES 2
collinelliot
Barite | Level 11

Look up "dummy variable trap." With dummy variables, you need the number of levels - 1 in the model. The one left out is the base against which the other paremeters are based.  At the most basic level, with two levels you need one dummy and the coefficient is the the effect of being one compared to whatever is zero.

PaigeMiller
Diamond | Level 26

As explained above, if you have N levels, you can only estimate n-1 coefficients plus the intercept. If you leave the intercept out of the model, then you can estimate all N levels. This is basic math.

 

 

Also, you keep writing something like this, in this and other threads

 

First half an hour: 9:00 << Dummy_1 < 9:30;

 

 

which makes absolutely no sense at all, dummy_1 is either 0 or 1 (otherwise it's not a dummy variable), and a variable that has values of 0 or 1 cannot be between 9:00 and 9:30. You most likely mean

 

dummy1 = 9:00 <= time_1 < 9:30;

 (which might not be correct syntax, but you get the idea)

 

 

so I would hope that you will write more meaningful and understandable math and SAS code in the future.

--
Paige Miller

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 4514 views
  • 0 likes
  • 3 in conversation