Forecasting using SAS Forecast Server, SAS/ETS, and more

Regression with Several Dummies

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 142
Accepted Solution

Regression with Several Dummies

Hello everybody;

Here is a sample of my data:

 

TRD_EVENT_DTTRD_EVENT_TMTRD_STCK_CDTRD_EVENT_ROUFORCountedVOLUMEIntradayVolumeVolumeadjusted_volume
2008-03-2413:13:44TRIR113:001617016170190275196 
2008-03-2412:12:03ZMYD112:009950000 11882920 
2008-03-2412:13:08ZMYD112:0019128401186284011882920 
2008-03-2413:13:44ZMYD113:00200802008011882920 
2008-03-2909:29:44CRBN1 9:301987674 61880278 
2008-03-2909:39:36CRBN1 9:303108926 61880278 
2008-03-2909:39:36CRBN1 9:307950000 61880278 
2008-03-2909:39:36CRBN1 9:303519000 61880278 
2008-03-2909:39:58CRBN1 9:3042000016985600618802782.962628593
2008-03-2909:55:20CRBN110:0011115001111500618802780.193867846
2008-03-2910:21:54CRBN110:30613500613500618802780.107006679
2008-03-2911:04:26CRBN111:00740500740500618802780.129158021
2008-03-2911:32:37CRBN111:3023865002386500618802780.416253364
2008-03-2912:12:20CRBN112:001506000015060000618802782.626765414
2008-03-2912:22:39CRBN112:30544027 61880278 
2008-03-2912:27:56CRBN112:301989000 61880278 
2008-03-2912:28:47CRBN112:301509600 61880278 
2008-03-2912:29:01CRBN112:301099055115033178618802782.62208712
2008-03-2909:47:34CRBN110:0099495009949500618802781.735391931
2008-03-2913:09:22CRBN113:001484014840618802780.002588393
2008-03-2911:20:12DADE111:301883805 97539198 
2008-03-2911:23:22DADE111:304523760 97539198 
2008-03-2911:23:26DADE111:3014841000 97539198 
2008-03-2911:23:29DADE111:3016800000 97539198 
2008-03-2911:23:33DADE111:308400000 97539198 
2008-03-2911:23:36DADE111:3016810000 97539198 
2008-03-2911:23:39DADE111:3016810000 97539198 
2008-03-2911:23:43DADE111:301681000096878565975391984.158915378
2008-03-2911:46:16DADE112:00660633660633975391980.028360419
2008-03-2913:09:22DADE113:001667016670975391980.000715629
2008-03-2909:55:07DALZ110:0065838606583860599955400.591979248
2008-03-2911:44:29DALZ111:3075561607556160599955400.679402344
2008-03-2911:48:10DALZ112:003474352034743520599955403.123918619
2008-03-2910:35:22DALZ110:301111200011112000599955400.999121094
2008-03-2913:09:22DALZ113:009110091100599955400.008191139
2008-03-2909:56:18DJBR110:002487012 34608420 
2008-03-2909:57:11DJBR110:003977721 34608420 
2008-03-2909:57:27DJBR110:00397772110442454346084201.002876735
2008-03-2911:08:21DJBR111:0019776241977624346084200.189927875
2008-03-2911:38:53DJBR111:302218834222188342346084202.130933205
2008-03-2913:09:22DJBR113:007492074920346084200.007195198
2008-03-2913:09:22DLGM113:0037910379102434051587 
2008-03-2911:09:19DRZK111:001977036197703619770361.621039513
2008-03-2913:09:22DRZK113:0014753014753019770360.120964899
2008-03-2909:13:32DSIN1 9:001169900 2339800 
2008-03-2909:14:00DSIN1 9:00584950175485023398000.031078709
2008-03-2909:39:48DSIN1 9:3058495058495023398000.01035957
2008-03-2913:09:22DSIN113:0011471011471023398000.002031535
2008-03-2910:56:30DSOB111:00198800019880001240000000.020678414
2008-03-2911:15:33DSOB111:30447200044720001240000000.04651603
2008-03-2912:27:07DSOB112:3010000000 124000000 

 

 

 

The CONTENTS Procedure:
Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 5 6 1 4 2 3 7 8
CountedVOLUMENum8   
IntradayVolumeNum8   
TRD_EVENT_DTChar10$10.$10.TRD_EVENT_DT
TRD_EVENT_ROUFORChar5   
TRD_EVENT_TMChar8$8.$8.TRD_EVENT_TM
TRD_STCK_CDChar5$5.$5.TRD_STCK_CD
VolumeNum8  Volume
adjusted_volumeNum8   

varaibles:

TRD_STCK_CD = name;

TRD_EVENT_TM = time;

TRD_EVENT_ROUFOR = The time variable (TRD_EVENT_TM) that has been rounded to half an hour periods.

I categorized this data in half an hour periods:

First half an hour: 9:00 <<CountedVOLUME < 9:30

Second half an hour: 9:30 << CountedVOLUME < 10:00

Third half an hour: 10:00 << CountedVOLUME < 10:30

Fourth half an hour: 10:30 << CountedVOLUME < 11

Fifth half an hour: 11:00 << CountedVOLUME< 11:30

Sixth half an hour: 11:30 << CountedVOLUME < 12

Seventh half an hour: 12:00 << CountedVOLUME << 12:30

Eights half an hour: 12:30 <<CountedVOLUME << 13:00.

 

I want to analyze this data by the regression specified below:

 

adjusted_volume = intercept + dummy var_1 [First half an hour] + dummy var_2 [Socend half an hour] + dummy var_3 [Third half an hour] + dummy var_4 [Fourth half an hour] + dummy var_5 [Fifth half an hour] + dummy var_6 [Sixth half an hour] + Residual

 

How can I run this regression using SAS?


Thanks.

 

 


Accepted Solutions
Solution
‎05-27-2017 04:33 PM
Trusted Advisor
Posts: 1,917

Re: Regression with Several Dummies

Posted in reply to aminkarimid

Again, you do not have to create the dummy variables yourself, PROC GLM will create the dummy variables for you. This is both easier and safer than creating the regression variables yourself. All you need is to put the variable TRD_EVENT_ROUFOR into a CLASS statement in PROC GLM.

 

Also, as someone stated above, you'd be wise to treat this as a continuous value and apply some sort of time series model, this would result in a superior model fit than if you use dummy variables.

 

Lastly, @aminkarimid, it is not a good idea, and not a good use of the forums, to create multiple threads on a single topic. One thread is better, because then all of the advice and suggestions are available in one place, rather than have it scattered in multiple threads.

 

 

View solution in original post


All Replies
Frequent Contributor
Posts: 142

Re: Regression with Several Dummies

Posted in reply to aminkarimid

I use the codes that has illustrated below:

 

DATA Sampledata87_02_Mer;
  SET Sampledata87_02_Mer ;
 
  	IF TRD_EVENT_ROUFOR = '9:00' THEN TRD_EVENT_ROUFOR_1 = 1; 
    ELSE TRD_EVENT_ROUFOR_1 = 0;
  	IF TRD_EVENT_ROUFOR = '9:30' THEN TRD_EVENT_ROUFOR_2 = 1; 
    ELSE TRD_EVENT_ROUFOR_2 = 0;
  	IF TRD_EVENT_ROUFOR = '10:00' THEN TRD_EVENT_ROUFOR_3 = 1; 
    ELSE TRD_EVENT_ROUFOR_3 = 0;
  	IF TRD_EVENT_ROUFOR = '10:30' THEN TRD_EVENT_ROUFOR_4 = 1; 
    ELSE TRD_EVENT_ROUFOR_4 = 0;
 	IF TRD_EVENT_ROUFOR = '11:00' THEN TRD_EVENT_ROUFOR_5 = 1; 
    ELSE TRD_EVENT_ROUFOR_5 = 0;
	IF TRD_EVENT_ROUFOR = '11:30' THEN TRD_EVENT_ROUFOR_6 = 1; 
    ELSE TRD_EVENT_ROUFOR_6 = 0;
	IF TRD_EVENT_ROUFOR = '12:00' THEN TRD_EVENT_ROUFOR_7 = 1; 
    ELSE TRD_EVENT_ROUFOR_7 = 0;
	IF TRD_EVENT_ROUFOR = '12:30' THEN TRD_EVENT_ROUFOR_8 = 1; 
    ELSE TRD_EVENT_ROUFOR_8 = 0;
	IF TRD_EVENT_ROUFOR = '13:00' THEN TRD_EVENT_ROUFOR_9 = 1; 
    ELSE TRD_EVENT_ROUFOR_9 = 0;
RUN;

But, it doesn't work!

Trusted Advisor
Posts: 1,917

Re: Regression with Several Dummies

Posted in reply to aminkarimid

By far the easiest thing to do would be to use PROC GLM to do this regression, if you make the time variables as CLASS variables, then GLM creates the DUMMY variables for you.

 

But ...

 

Do you really mean you want COUNTEDVOLUME by half hour time intervals, as shown in your pseudo-code? I don't understand that at all, it makes no sense in the context of your question. Could you explain that further?

Frequent Contributor
Posts: 142

Re: Regression with Several Dummies

[ Edited ]
Posted in reply to PaigeMiller

I have categorized COUNTEDVOLUME variable (= IntradayVolume variable). I just want to generate dummy varaibles based on time (TRD_EVENT_ROUFOR) and run the specified reggression that I have explained before.

Thanks.

Super User
Posts: 19,792

Re: Regression with Several Dummies

Posted in reply to aminkarimid

Why wouldn't use use one of the time series proc that could handle a time series data and deal with lags?

 

Try PROC ARIMA or PROC AUTOREG. 

 

You may need to change your data structure, but it may also be able to handle this.

 

https://support.sas.com/documentation/onlinedoc/ets/indexproc.html#ets142

Frequent Contributor
Posts: 142

Re: Regression with Several Dummies

 


Here is  the freq procedure:

Why are the  TRD_EVENT_ROUFOR_1 & TRD_EVENT_ROUFOR_2 columns totally zero?
Why hasn't it shown the other dummies (ex. TRD_EVENT_ROUFOR_8)?

What is the problem? Which part of my code is wrong?

 

SAS Output

 

TRD_EVENT_ROUFORTRD_EVENT_ROUFOR_1TRD_EVENT_ROUFOR_2TRD_EVENT_ROUFOR_3TRD_EVENT_ROUFOR_4TRD_EVENT_ROUFOR_5FrequencyPercentCumulativeCumulative
FrequencyPercent
9:0000000563613.12563613.12
9:3000000648115.091211728.21
10:0000100454610.581666338.8
10:3000010467010.872133349.67
11:0000001516412.022649761.7
11:3000000545012.693194774.39
12:0000000440210.253634984.63
12:3000000595513.874230498.5
13:00000006411.494294599.99
14:00000001042946100
14:30000001042947100
15:30000001042948100
Frequency Missing = 30691

 

 

 

DATA Sampledata87_02_Mer_DumVar;
  SET Sampledata87_02_Mer ;
 
  	IF TRD_EVENT_ROUFOR = '9:00' THEN TRD_EVENT_ROUFOR_1 = 1; 
    ELSE TRD_EVENT_ROUFOR_1 = 0;
  	IF TRD_EVENT_ROUFOR = '9:30' THEN TRD_EVENT_ROUFOR_2 = 1; 
    ELSE TRD_EVENT_ROUFOR_2 = 0;
  	IF TRD_EVENT_ROUFOR = '10:00' THEN TRD_EVENT_ROUFOR_3 = 1; 
    ELSE TRD_EVENT_ROUFOR_3 = 0;
  	IF TRD_EVENT_ROUFOR = '10:30' THEN TRD_EVENT_ROUFOR_4 = 1; 
    ELSE TRD_EVENT_ROUFOR_4 = 0;
 	IF TRD_EVENT_ROUFOR = '11:00' THEN TRD_EVENT_ROUFOR_5 = 1; 
    ELSE TRD_EVENT_ROUFOR_5 = 0;
	IF TRD_EVENT_ROUFOR = '11:30' THEN TRD_EVENT_ROUFOR_6 = 1; 
    ELSE TRD_EVENT_ROUFOR_6 = 0;
	IF TRD_EVENT_ROUFOR = '12:00' THEN TRD_EVENT_ROUFOR_7 = 1; 
    ELSE TRD_EVENT_ROUFOR_7 = 0;
	IF TRD_EVENT_ROUFOR = '12:30' THEN TRD_EVENT_ROUFOR_8 = 1; 
    ELSE TRD_EVENT_ROUFOR_8 = 0;
	IF TRD_EVENT_ROUFOR = '13:00' THEN TRD_EVENT_ROUFOR_9 = 1; 
    ELSE TRD_EVENT_ROUFOR_9 = 0;
RUN;

PROC FREQ DATA=Sampledata87_02_Mer_DumVar;
  TABLES TRD_EVENT_ROUFOR*TRD_EVENT_ROUFOR_1*TRD_EVENT_ROUFOR_2*TRD_EVENT_ROUFOR_3*TRD_EVENT_ROUFOR_4*TRD_EVENT_ROUFOR_5 / list ;
RUN;

 

 

 

 

Solution
‎05-27-2017 04:33 PM
Trusted Advisor
Posts: 1,917

Re: Regression with Several Dummies

Posted in reply to aminkarimid

Again, you do not have to create the dummy variables yourself, PROC GLM will create the dummy variables for you. This is both easier and safer than creating the regression variables yourself. All you need is to put the variable TRD_EVENT_ROUFOR into a CLASS statement in PROC GLM.

 

Also, as someone stated above, you'd be wise to treat this as a continuous value and apply some sort of time series model, this would result in a superior model fit than if you use dummy variables.

 

Lastly, @aminkarimid, it is not a good idea, and not a good use of the forums, to create multiple threads on a single topic. One thread is better, because then all of the advice and suggestions are available in one place, rather than have it scattered in multiple threads.

 

 

Frequent Contributor
Posts: 142

Re: Regression with Several Dummies

Posted in reply to PaigeMiller
Dear PaigeMiller;
Can you give me an example of how can I use PROC GLM for this regression?
Best regards.
Trusted Advisor
Posts: 1,917

Re: Regression with Several Dummies

[ Edited ]
Posted in reply to aminkarimid

aminkarimid wrote:
Dear PaigeMiller;
Can you give me an example of how can I use PROC GLM for this regression?
Best regards.

The last sentence of my first paragraph in message 7 explains how to do this.

Frequent Contributor
Posts: 142

Re: Regression with Several Dummies

Posted in reply to PaigeMiller
Super User
Posts: 19,792

Re: Regression with Several Dummies

[ Edited ]
Posted in reply to aminkarimid

aminkarimid wrote:
Here is a very nice tip:
http://blogs.sas.com/content/iml/2016/02/22/create-dummy-variables-in-sas.html
Thanks.

 

In general yes, but I don't think that post applies in your case. 

FYI - you should search before you post questions anyways. 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Var...

 

EDIT: You should note that SAS will drop any observations from a model if any of the variables is missing. The data you posted has missing values so be careful you understand your regression. 

 

 

Frequent Contributor
Posts: 142

Re: Regression with Several Dummies

[ Edited ]

Does it mean that I can not use GLM PROC for running regression?
Here is my code:

proc glm data=Sampledata87_02_mer;
   class TRD_EVENT_ROUFOR;              /* Generates dummy variables internally */
   model adjusted_volume = TRD_EVENT_ROUFOR / solution;
   ods select ParameterEstimates;
quit;


Sorry if these questions are obvious, I'm still learning how to navigate SAS documentation.

-Thanks

Trusted Advisor
Posts: 1,917

Re: Regression with Several Dummies

Posted in reply to aminkarimid

aminkarimid wrote:

Does it mean that I can not use GLM PROC for running regression?
Here is my code:

proc glm data=Sampledata87_02_mer;
   class TRD_EVENT_ROUFOR;              /* Generates dummy variables internally */
   model adjusted_volume = TRD_EVENT_ROUFOR / solution;
   ods select ParameterEstimates;
quit;


Sorry if these questions are obvious, I'm still learning how to navigate SAS documentation.

-Thanks


This seems to be the solution I was recommending. The ODS SELECT statement is optional.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 232 views
  • 3 likes
  • 3 in conversation