Hello everybody;
Here is a sample of my data:
TRD_EVENT_DT | TRD_EVENT_TM | TRD_STCK_CD | TRD_EVENT_ROUFOR | CountedVOLUME | IntradayVolume | Volume | adjusted_volume |
2008-03-24 | 13:13:44 | TRIR1 | 13:00 | 16170 | 16170 | 190275196 | |
2008-03-24 | 12:12:03 | ZMYD1 | 12:00 | 9950000 | 11882920 | ||
2008-03-24 | 12:13:08 | ZMYD1 | 12:00 | 1912840 | 11862840 | 11882920 | |
2008-03-24 | 13:13:44 | ZMYD1 | 13:00 | 20080 | 20080 | 11882920 | |
2008-03-29 | 09:29:44 | CRBN1 | 9:30 | 1987674 | 61880278 | ||
2008-03-29 | 09:39:36 | CRBN1 | 9:30 | 3108926 | 61880278 | ||
2008-03-29 | 09:39:36 | CRBN1 | 9:30 | 7950000 | 61880278 | ||
2008-03-29 | 09:39:36 | CRBN1 | 9:30 | 3519000 | 61880278 | ||
2008-03-29 | 09:39:58 | CRBN1 | 9:30 | 420000 | 16985600 | 61880278 | 2.962628593 |
2008-03-29 | 09:55:20 | CRBN1 | 10:00 | 1111500 | 1111500 | 61880278 | 0.193867846 |
2008-03-29 | 10:21:54 | CRBN1 | 10:30 | 613500 | 613500 | 61880278 | 0.107006679 |
2008-03-29 | 11:04:26 | CRBN1 | 11:00 | 740500 | 740500 | 61880278 | 0.129158021 |
2008-03-29 | 11:32:37 | CRBN1 | 11:30 | 2386500 | 2386500 | 61880278 | 0.416253364 |
2008-03-29 | 12:12:20 | CRBN1 | 12:00 | 15060000 | 15060000 | 61880278 | 2.626765414 |
2008-03-29 | 12:22:39 | CRBN1 | 12:30 | 544027 | 61880278 | ||
2008-03-29 | 12:27:56 | CRBN1 | 12:30 | 1989000 | 61880278 | ||
2008-03-29 | 12:28:47 | CRBN1 | 12:30 | 1509600 | 61880278 | ||
2008-03-29 | 12:29:01 | CRBN1 | 12:30 | 10990551 | 15033178 | 61880278 | 2.62208712 |
2008-03-29 | 09:47:34 | CRBN1 | 10:00 | 9949500 | 9949500 | 61880278 | 1.735391931 |
2008-03-29 | 13:09:22 | CRBN1 | 13:00 | 14840 | 14840 | 61880278 | 0.002588393 |
2008-03-29 | 11:20:12 | DADE1 | 11:30 | 1883805 | 97539198 | ||
2008-03-29 | 11:23:22 | DADE1 | 11:30 | 4523760 | 97539198 | ||
2008-03-29 | 11:23:26 | DADE1 | 11:30 | 14841000 | 97539198 | ||
2008-03-29 | 11:23:29 | DADE1 | 11:30 | 16800000 | 97539198 | ||
2008-03-29 | 11:23:33 | DADE1 | 11:30 | 8400000 | 97539198 | ||
2008-03-29 | 11:23:36 | DADE1 | 11:30 | 16810000 | 97539198 | ||
2008-03-29 | 11:23:39 | DADE1 | 11:30 | 16810000 | 97539198 | ||
2008-03-29 | 11:23:43 | DADE1 | 11:30 | 16810000 | 96878565 | 97539198 | 4.158915378 |
2008-03-29 | 11:46:16 | DADE1 | 12:00 | 660633 | 660633 | 97539198 | 0.028360419 |
2008-03-29 | 13:09:22 | DADE1 | 13:00 | 16670 | 16670 | 97539198 | 0.000715629 |
2008-03-29 | 09:55:07 | DALZ1 | 10:00 | 6583860 | 6583860 | 59995540 | 0.591979248 |
2008-03-29 | 11:44:29 | DALZ1 | 11:30 | 7556160 | 7556160 | 59995540 | 0.679402344 |
2008-03-29 | 11:48:10 | DALZ1 | 12:00 | 34743520 | 34743520 | 59995540 | 3.123918619 |
2008-03-29 | 10:35:22 | DALZ1 | 10:30 | 11112000 | 11112000 | 59995540 | 0.999121094 |
2008-03-29 | 13:09:22 | DALZ1 | 13:00 | 91100 | 91100 | 59995540 | 0.008191139 |
2008-03-29 | 09:56:18 | DJBR1 | 10:00 | 2487012 | 34608420 | ||
2008-03-29 | 09:57:11 | DJBR1 | 10:00 | 3977721 | 34608420 | ||
2008-03-29 | 09:57:27 | DJBR1 | 10:00 | 3977721 | 10442454 | 34608420 | 1.002876735 |
2008-03-29 | 11:08:21 | DJBR1 | 11:00 | 1977624 | 1977624 | 34608420 | 0.189927875 |
2008-03-29 | 11:38:53 | DJBR1 | 11:30 | 22188342 | 22188342 | 34608420 | 2.130933205 |
2008-03-29 | 13:09:22 | DJBR1 | 13:00 | 74920 | 74920 | 34608420 | 0.007195198 |
2008-03-29 | 13:09:22 | DLGM1 | 13:00 | 37910 | 37910 | 2434051587 | |
2008-03-29 | 11:09:19 | DRZK1 | 11:00 | 1977036 | 1977036 | 1977036 | 1.621039513 |
2008-03-29 | 13:09:22 | DRZK1 | 13:00 | 147530 | 147530 | 1977036 | 0.120964899 |
2008-03-29 | 09:13:32 | DSIN1 | 9:00 | 1169900 | 2339800 | ||
2008-03-29 | 09:14:00 | DSIN1 | 9:00 | 584950 | 1754850 | 2339800 | 0.031078709 |
2008-03-29 | 09:39:48 | DSIN1 | 9:30 | 584950 | 584950 | 2339800 | 0.01035957 |
2008-03-29 | 13:09:22 | DSIN1 | 13:00 | 114710 | 114710 | 2339800 | 0.002031535 |
2008-03-29 | 10:56:30 | DSOB1 | 11:00 | 1988000 | 1988000 | 124000000 | 0.020678414 |
2008-03-29 | 11:15:33 | DSOB1 | 11:30 | 4472000 | 4472000 | 124000000 | 0.04651603 |
2008-03-29 | 12:27:07 | DSOB1 | 12:30 | 10000000 | 124000000 |
CountedVOLUME | Num | 8 | |||
IntradayVolume | Num | 8 | |||
TRD_EVENT_DT | Char | 10 | $10. | $10. | TRD_EVENT_DT |
TRD_EVENT_ROUFOR | Char | 5 | |||
TRD_EVENT_TM | Char | 8 | $8. | $8. | TRD_EVENT_TM |
TRD_STCK_CD | Char | 5 | $5. | $5. | TRD_STCK_CD |
Volume | Num | 8 | Volume | ||
adjusted_volume | Num | 8 |
varaibles:
TRD_STCK_CD = name;
TRD_EVENT_TM = time;
TRD_EVENT_ROUFOR = The time variable (TRD_EVENT_TM) that has been rounded to half an hour periods.
I categorized this data in half an hour periods:
First half an hour: 9:00 <<CountedVOLUME < 9:30
Second half an hour: 9:30 << CountedVOLUME < 10:00
Third half an hour: 10:00 << CountedVOLUME < 10:30
Fourth half an hour: 10:30 << CountedVOLUME < 11
Fifth half an hour: 11:00 << CountedVOLUME< 11:30
Sixth half an hour: 11:30 << CountedVOLUME < 12
Seventh half an hour: 12:00 << CountedVOLUME << 12:30
Eights half an hour: 12:30 <<CountedVOLUME << 13:00.
I want to analyze this data by the regression specified below:
adjusted_volume = intercept + dummy var_1 [First half an hour] + dummy var_2 [Socend half an hour] + dummy var_3 [Third half an hour] + dummy var_4 [Fourth half an hour] + dummy var_5 [Fifth half an hour] + dummy var_6 [Sixth half an hour] + Residual
How can I run this regression using SAS?
Thanks.
Again, you do not have to create the dummy variables yourself, PROC GLM will create the dummy variables for you. This is both easier and safer than creating the regression variables yourself. All you need is to put the variable TRD_EVENT_ROUFOR into a CLASS statement in PROC GLM.
Also, as someone stated above, you'd be wise to treat this as a continuous value and apply some sort of time series model, this would result in a superior model fit than if you use dummy variables.
Lastly, @aminkarimid, it is not a good idea, and not a good use of the forums, to create multiple threads on a single topic. One thread is better, because then all of the advice and suggestions are available in one place, rather than have it scattered in multiple threads.
I use the codes that has illustrated below:
DATA Sampledata87_02_Mer;
SET Sampledata87_02_Mer ;
IF TRD_EVENT_ROUFOR = '9:00' THEN TRD_EVENT_ROUFOR_1 = 1;
ELSE TRD_EVENT_ROUFOR_1 = 0;
IF TRD_EVENT_ROUFOR = '9:30' THEN TRD_EVENT_ROUFOR_2 = 1;
ELSE TRD_EVENT_ROUFOR_2 = 0;
IF TRD_EVENT_ROUFOR = '10:00' THEN TRD_EVENT_ROUFOR_3 = 1;
ELSE TRD_EVENT_ROUFOR_3 = 0;
IF TRD_EVENT_ROUFOR = '10:30' THEN TRD_EVENT_ROUFOR_4 = 1;
ELSE TRD_EVENT_ROUFOR_4 = 0;
IF TRD_EVENT_ROUFOR = '11:00' THEN TRD_EVENT_ROUFOR_5 = 1;
ELSE TRD_EVENT_ROUFOR_5 = 0;
IF TRD_EVENT_ROUFOR = '11:30' THEN TRD_EVENT_ROUFOR_6 = 1;
ELSE TRD_EVENT_ROUFOR_6 = 0;
IF TRD_EVENT_ROUFOR = '12:00' THEN TRD_EVENT_ROUFOR_7 = 1;
ELSE TRD_EVENT_ROUFOR_7 = 0;
IF TRD_EVENT_ROUFOR = '12:30' THEN TRD_EVENT_ROUFOR_8 = 1;
ELSE TRD_EVENT_ROUFOR_8 = 0;
IF TRD_EVENT_ROUFOR = '13:00' THEN TRD_EVENT_ROUFOR_9 = 1;
ELSE TRD_EVENT_ROUFOR_9 = 0;
RUN;
But, it doesn't work!
By far the easiest thing to do would be to use PROC GLM to do this regression, if you make the time variables as CLASS variables, then GLM creates the DUMMY variables for you.
But ...
Do you really mean you want COUNTEDVOLUME by half hour time intervals, as shown in your pseudo-code? I don't understand that at all, it makes no sense in the context of your question. Could you explain that further?
I have categorized COUNTEDVOLUME variable (= IntradayVolume variable). I just want to generate dummy varaibles based on time (TRD_EVENT_ROUFOR) and run the specified reggression that I have explained before.
Thanks.
Why wouldn't use use one of the time series proc that could handle a time series data and deal with lags?
Try PROC ARIMA or PROC AUTOREG.
You may need to change your data structure, but it may also be able to handle this.
https://support.sas.com/documentation/onlinedoc/ets/indexproc.html#ets142
Here is the freq procedure:
Why are the TRD_EVENT_ROUFOR_1 & TRD_EVENT_ROUFOR_2 columns totally zero?
Why hasn't it shown the other dummies (ex. TRD_EVENT_ROUFOR_8)?
What is the problem? Which part of my code is wrong?
SAS Output
TRD_EVENT_ROUFOR | TRD_EVENT_ROUFOR_1 | TRD_EVENT_ROUFOR_2 | TRD_EVENT_ROUFOR_3 | TRD_EVENT_ROUFOR_4 | TRD_EVENT_ROUFOR_5 | Frequency | Percent | Cumulative | Cumulative |
Frequency | Percent | ||||||||
9:00 | 0 | 0 | 0 | 0 | 0 | 5636 | 13.12 | 5636 | 13.12 |
9:30 | 0 | 0 | 0 | 0 | 0 | 6481 | 15.09 | 12117 | 28.21 |
10:00 | 0 | 0 | 1 | 0 | 0 | 4546 | 10.58 | 16663 | 38.8 |
10:30 | 0 | 0 | 0 | 1 | 0 | 4670 | 10.87 | 21333 | 49.67 |
11:00 | 0 | 0 | 0 | 0 | 1 | 5164 | 12.02 | 26497 | 61.7 |
11:30 | 0 | 0 | 0 | 0 | 0 | 5450 | 12.69 | 31947 | 74.39 |
12:00 | 0 | 0 | 0 | 0 | 0 | 4402 | 10.25 | 36349 | 84.63 |
12:30 | 0 | 0 | 0 | 0 | 0 | 5955 | 13.87 | 42304 | 98.5 |
13:00 | 0 | 0 | 0 | 0 | 0 | 641 | 1.49 | 42945 | 99.99 |
14:00 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 42946 | 100 |
14:30 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 42947 | 100 |
15:30 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 42948 | 100 |
Frequency Missing = 30691 |
DATA Sampledata87_02_Mer_DumVar;
SET Sampledata87_02_Mer ;
IF TRD_EVENT_ROUFOR = '9:00' THEN TRD_EVENT_ROUFOR_1 = 1;
ELSE TRD_EVENT_ROUFOR_1 = 0;
IF TRD_EVENT_ROUFOR = '9:30' THEN TRD_EVENT_ROUFOR_2 = 1;
ELSE TRD_EVENT_ROUFOR_2 = 0;
IF TRD_EVENT_ROUFOR = '10:00' THEN TRD_EVENT_ROUFOR_3 = 1;
ELSE TRD_EVENT_ROUFOR_3 = 0;
IF TRD_EVENT_ROUFOR = '10:30' THEN TRD_EVENT_ROUFOR_4 = 1;
ELSE TRD_EVENT_ROUFOR_4 = 0;
IF TRD_EVENT_ROUFOR = '11:00' THEN TRD_EVENT_ROUFOR_5 = 1;
ELSE TRD_EVENT_ROUFOR_5 = 0;
IF TRD_EVENT_ROUFOR = '11:30' THEN TRD_EVENT_ROUFOR_6 = 1;
ELSE TRD_EVENT_ROUFOR_6 = 0;
IF TRD_EVENT_ROUFOR = '12:00' THEN TRD_EVENT_ROUFOR_7 = 1;
ELSE TRD_EVENT_ROUFOR_7 = 0;
IF TRD_EVENT_ROUFOR = '12:30' THEN TRD_EVENT_ROUFOR_8 = 1;
ELSE TRD_EVENT_ROUFOR_8 = 0;
IF TRD_EVENT_ROUFOR = '13:00' THEN TRD_EVENT_ROUFOR_9 = 1;
ELSE TRD_EVENT_ROUFOR_9 = 0;
RUN;
PROC FREQ DATA=Sampledata87_02_Mer_DumVar;
TABLES TRD_EVENT_ROUFOR*TRD_EVENT_ROUFOR_1*TRD_EVENT_ROUFOR_2*TRD_EVENT_ROUFOR_3*TRD_EVENT_ROUFOR_4*TRD_EVENT_ROUFOR_5 / list ;
RUN;
Again, you do not have to create the dummy variables yourself, PROC GLM will create the dummy variables for you. This is both easier and safer than creating the regression variables yourself. All you need is to put the variable TRD_EVENT_ROUFOR into a CLASS statement in PROC GLM.
Also, as someone stated above, you'd be wise to treat this as a continuous value and apply some sort of time series model, this would result in a superior model fit than if you use dummy variables.
Lastly, @aminkarimid, it is not a good idea, and not a good use of the forums, to create multiple threads on a single topic. One thread is better, because then all of the advice and suggestions are available in one place, rather than have it scattered in multiple threads.
@aminkarimid wrote:
Dear PaigeMiller;
Can you give me an example of how can I use PROC GLM for this regression?
Best regards.
The last sentence of my first paragraph in message 7 explains how to do this.
@aminkarimid wrote:
Here is a very nice tip:
http://blogs.sas.com/content/iml/2016/02/22/create-dummy-variables-in-sas.html
Thanks.
In general yes, but I don't think that post applies in your case.
FYI - you should search before you post questions anyways.
EDIT: You should note that SAS will drop any observations from a model if any of the variables is missing. The data you posted has missing values so be careful you understand your regression.
Does it mean that I can not use GLM PROC for running regression?
Here is my code:
proc glm data=Sampledata87_02_mer;
class TRD_EVENT_ROUFOR; /* Generates dummy variables internally */
model adjusted_volume = TRD_EVENT_ROUFOR / solution;
ods select ParameterEstimates;
quit;
Sorry if these questions are obvious, I'm still learning how to navigate SAS documentation.
-Thanks
@aminkarimid wrote:
Does it mean that I can not use GLM PROC for running regression?
Here is my code:proc glm data=Sampledata87_02_mer; class TRD_EVENT_ROUFOR; /* Generates dummy variables internally */ model adjusted_volume = TRD_EVENT_ROUFOR / solution; ods select ParameterEstimates; quit;
Sorry if these questions are obvious, I'm still learning how to navigate SAS documentation.
-Thanks
This seems to be the solution I was recommending. The ODS SELECT statement is optional.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.