## Regression with Several Dummies

Hello everybody;

Here is a sample of my data:

The CONTENTS Procedure:
Alphabetic List of Variables and Attributes # Variable Type Len Format Informat Label 5 6 1 4 2 3 7 8
 CountedVOLUME Num 8 IntradayVolume Num 8 TRD_EVENT_DT Char 10 \$10. \$10. TRD_EVENT_DT TRD_EVENT_ROUFOR Char 5 TRD_EVENT_TM Char 8 \$8. \$8. TRD_EVENT_TM TRD_STCK_CD Char 5 \$5. \$5. TRD_STCK_CD Volume Num 8 Volume adjusted_volume Num 8

varaibles:

TRD_STCK_CD = name;

TRD_EVENT_TM = time;

TRD_EVENT_ROUFOR = The time variable (TRD_EVENT_TM) that has been rounded to half an hour periods.

I categorized this data in half an hour periods:

First half an hour: 9:00 <<CountedVOLUME < 9:30

Second half an hour: 9:30 << CountedVOLUME < 10:00

Third half an hour: 10:00 << CountedVOLUME < 10:30

Fourth half an hour: 10:30 << CountedVOLUME < 11

Fifth half an hour: 11:00 << CountedVOLUME< 11:30

Sixth half an hour: 11:30 << CountedVOLUME < 12

Seventh half an hour: 12:00 << CountedVOLUME << 12:30

Eights half an hour: 12:30 <<CountedVOLUME << 13:00.

I want to analyze this data by the regression specified below:

adjusted_volume = intercept + dummy var_1 [First half an hour] + dummy var_2 [Socend half an hour] + dummy var_3 [Third half an hour] + dummy var_4 [Fourth half an hour] + dummy var_5 [Fifth half an hour] + dummy var_6 [Sixth half an hour] + Residual

How can I run this regression using SAS?

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

## Re: Regression with Several Dummies

Again, you do not have to create the dummy variables yourself, PROC GLM will create the dummy variables for you. This is both easier and safer than creating the regression variables yourself. All you need is to put the variable TRD_EVENT_ROUFOR into a CLASS statement in PROC GLM.

Also, as someone stated above, you'd be wise to treat this as a continuous value and apply some sort of time series model, this would result in a superior model fit than if you use dummy variables.

Lastly, @aminkarimid, it is not a good idea, and not a good use of the forums, to create multiple threads on a single topic. One thread is better, because then all of the advice and suggestions are available in one place, rather than have it scattered in multiple threads.

--
Paige Miller
12 REPLIES 12

## Re: Regression with Several Dummies

I use the codes that has illustrated below:

``````DATA Sampledata87_02_Mer;
SET Sampledata87_02_Mer ;

IF TRD_EVENT_ROUFOR = '9:00' THEN TRD_EVENT_ROUFOR_1 = 1;
ELSE TRD_EVENT_ROUFOR_1 = 0;
IF TRD_EVENT_ROUFOR = '9:30' THEN TRD_EVENT_ROUFOR_2 = 1;
ELSE TRD_EVENT_ROUFOR_2 = 0;
IF TRD_EVENT_ROUFOR = '10:00' THEN TRD_EVENT_ROUFOR_3 = 1;
ELSE TRD_EVENT_ROUFOR_3 = 0;
IF TRD_EVENT_ROUFOR = '10:30' THEN TRD_EVENT_ROUFOR_4 = 1;
ELSE TRD_EVENT_ROUFOR_4 = 0;
IF TRD_EVENT_ROUFOR = '11:00' THEN TRD_EVENT_ROUFOR_5 = 1;
ELSE TRD_EVENT_ROUFOR_5 = 0;
IF TRD_EVENT_ROUFOR = '11:30' THEN TRD_EVENT_ROUFOR_6 = 1;
ELSE TRD_EVENT_ROUFOR_6 = 0;
IF TRD_EVENT_ROUFOR = '12:00' THEN TRD_EVENT_ROUFOR_7 = 1;
ELSE TRD_EVENT_ROUFOR_7 = 0;
IF TRD_EVENT_ROUFOR = '12:30' THEN TRD_EVENT_ROUFOR_8 = 1;
ELSE TRD_EVENT_ROUFOR_8 = 0;
IF TRD_EVENT_ROUFOR = '13:00' THEN TRD_EVENT_ROUFOR_9 = 1;
ELSE TRD_EVENT_ROUFOR_9 = 0;
RUN;``````

But, it doesn't work!

## Re: Regression with Several Dummies

By far the easiest thing to do would be to use PROC GLM to do this regression, if you make the time variables as CLASS variables, then GLM creates the DUMMY variables for you.

But ...

Do you really mean you want COUNTEDVOLUME by half hour time intervals, as shown in your pseudo-code? I don't understand that at all, it makes no sense in the context of your question. Could you explain that further?

--
Paige Miller

## Re: Regression with Several Dummies

I have categorized COUNTEDVOLUME variable (= IntradayVolume variable). I just want to generate dummy varaibles based on time (TRD_EVENT_ROUFOR) and run the specified reggression that I have explained before.

Thanks.

## Re: Regression with Several Dummies

Why wouldn't use use one of the time series proc that could handle a time series data and deal with lags?

Try PROC ARIMA or PROC AUTOREG.

You may need to change your data structure, but it may also be able to handle this.

https://support.sas.com/documentation/onlinedoc/ets/indexproc.html#ets142

## Re: Regression with Several Dummies

Here is  the freq procedure:

Why are the  TRD_EVENT_ROUFOR_1 & TRD_EVENT_ROUFOR_2 columns totally zero?
Why hasn't it shown the other dummies (ex. TRD_EVENT_ROUFOR_8)?

What is the problem? Which part of my code is wrong?

SAS Output

 TRD_EVENT_ROUFOR TRD_EVENT_ROUFOR_1 TRD_EVENT_ROUFOR_2 TRD_EVENT_ROUFOR_3 TRD_EVENT_ROUFOR_4 TRD_EVENT_ROUFOR_5 Frequency Percent Cumulative Cumulative Frequency Percent 9:00 0 0 0 0 0 5636 13.12 5636 13.12 9:30 0 0 0 0 0 6481 15.09 12117 28.21 10:00 0 0 1 0 0 4546 10.58 16663 38.8 10:30 0 0 0 1 0 4670 10.87 21333 49.67 11:00 0 0 0 0 1 5164 12.02 26497 61.7 11:30 0 0 0 0 0 5450 12.69 31947 74.39 12:00 0 0 0 0 0 4402 10.25 36349 84.63 12:30 0 0 0 0 0 5955 13.87 42304 98.5 13:00 0 0 0 0 0 641 1.49 42945 99.99 14:00 0 0 0 0 0 1 0 42946 100 14:30 0 0 0 0 0 1 0 42947 100 15:30 0 0 0 0 0 1 0 42948 100 Frequency Missing = 30691

``````DATA Sampledata87_02_Mer_DumVar;
SET Sampledata87_02_Mer ;

IF TRD_EVENT_ROUFOR = '9:00' THEN TRD_EVENT_ROUFOR_1 = 1;
ELSE TRD_EVENT_ROUFOR_1 = 0;
IF TRD_EVENT_ROUFOR = '9:30' THEN TRD_EVENT_ROUFOR_2 = 1;
ELSE TRD_EVENT_ROUFOR_2 = 0;
IF TRD_EVENT_ROUFOR = '10:00' THEN TRD_EVENT_ROUFOR_3 = 1;
ELSE TRD_EVENT_ROUFOR_3 = 0;
IF TRD_EVENT_ROUFOR = '10:30' THEN TRD_EVENT_ROUFOR_4 = 1;
ELSE TRD_EVENT_ROUFOR_4 = 0;
IF TRD_EVENT_ROUFOR = '11:00' THEN TRD_EVENT_ROUFOR_5 = 1;
ELSE TRD_EVENT_ROUFOR_5 = 0;
IF TRD_EVENT_ROUFOR = '11:30' THEN TRD_EVENT_ROUFOR_6 = 1;
ELSE TRD_EVENT_ROUFOR_6 = 0;
IF TRD_EVENT_ROUFOR = '12:00' THEN TRD_EVENT_ROUFOR_7 = 1;
ELSE TRD_EVENT_ROUFOR_7 = 0;
IF TRD_EVENT_ROUFOR = '12:30' THEN TRD_EVENT_ROUFOR_8 = 1;
ELSE TRD_EVENT_ROUFOR_8 = 0;
IF TRD_EVENT_ROUFOR = '13:00' THEN TRD_EVENT_ROUFOR_9 = 1;
ELSE TRD_EVENT_ROUFOR_9 = 0;
RUN;

PROC FREQ DATA=Sampledata87_02_Mer_DumVar;
TABLES TRD_EVENT_ROUFOR*TRD_EVENT_ROUFOR_1*TRD_EVENT_ROUFOR_2*TRD_EVENT_ROUFOR_3*TRD_EVENT_ROUFOR_4*TRD_EVENT_ROUFOR_5 / list ;
RUN;``````

## Re: Regression with Several Dummies

Again, you do not have to create the dummy variables yourself, PROC GLM will create the dummy variables for you. This is both easier and safer than creating the regression variables yourself. All you need is to put the variable TRD_EVENT_ROUFOR into a CLASS statement in PROC GLM.

Also, as someone stated above, you'd be wise to treat this as a continuous value and apply some sort of time series model, this would result in a superior model fit than if you use dummy variables.

Lastly, @aminkarimid, it is not a good idea, and not a good use of the forums, to create multiple threads on a single topic. One thread is better, because then all of the advice and suggestions are available in one place, rather than have it scattered in multiple threads.

--
Paige Miller

## Re: Regression with Several Dummies

Dear PaigeMiller;
Can you give me an example of how can I use PROC GLM for this regression?
Best regards.

## Re: Regression with Several Dummies

@aminkarimid wrote:
Dear PaigeMiller;
Can you give me an example of how can I use PROC GLM for this regression?
Best regards.

The last sentence of my first paragraph in message 7 explains how to do this.

--
Paige Miller

## Re: Regression with Several Dummies

@aminkarimid wrote:
Here is a very nice tip:
http://blogs.sas.com/content/iml/2016/02/22/create-dummy-variables-in-sas.html
Thanks.

In general yes, but I don't think that post applies in your case.

FYI - you should search before you post questions anyways.

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-dummy-variables-Categorical-Var...

EDIT: You should note that SAS will drop any observations from a model if any of the variables is missing. The data you posted has missing values so be careful you understand your regression.

## Re: Regression with Several Dummies

Does it mean that I can not use GLM PROC for running regression?
Here is my code:

``````proc glm data=Sampledata87_02_mer;
class TRD_EVENT_ROUFOR;              /* Generates dummy variables internally */
model adjusted_volume = TRD_EVENT_ROUFOR / solution;
ods select ParameterEstimates;
quit;``````

Sorry if these questions are obvious, I'm still learning how to navigate SAS documentation.

-Thanks

## Re: Regression with Several Dummies

@aminkarimid wrote:

Does it mean that I can not use GLM PROC for running regression?
Here is my code:

``````proc glm data=Sampledata87_02_mer;
class TRD_EVENT_ROUFOR;              /* Generates dummy variables internally */
model adjusted_volume = TRD_EVENT_ROUFOR / solution;
ods select ParameterEstimates;
quit;``````

Sorry if these questions are obvious, I'm still learning how to navigate SAS documentation.

-Thanks

This seems to be the solution I was recommending. The ODS SELECT statement is optional.

--
Paige Miller
Discussion stats
• 12 replies
• 899 views
• 3 likes
• 3 in conversation