turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Dummy variables for years

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

12-10-2015 10:21 PM

Hi everyone,

I have a sas program that i have been using and now I need to add dummy variables for years and industry to my variables and then use these dummy variables in regressions. So I was thinking of first adding a column called year:

year=year(Andate);

run;

and then the dummy variables :

zero=0;

one=0;

two=0;

three=0;

four=0;

five=0;

six=0;

seven=0;

eight=0;

nine=0;

ten=0;

eleven=0;

twelve=0;

thirteen=0;

fourteen=0;

if year=2000 then zero=1;

if year=2001 then one=1;

if year=2002 then two=1;

if year=2003 then three=1;

if year=2004 then four=1;

if year=2005 then five=1;

if year=2006 then six=1;

if year=2007 then seven=1;

if year=2008 then eight=1;

if year=2009 then nine=1;

if year=2010 then ten=1;

if year=2011 then eleven=1;

if year=2012 then twelve=1;

if year=2013 then thirteen=1;

if year=2014 then fourteen=1;

Am I correct?Also my professor insists that if the number of dummies is equal to the number of treatments, the regression will suffer from exact multicolinearity, so I have to have 13 dummies for year. If I only create 13 dummies for year (for example from 2000 to 2013) how can I run a regression on my dependant variable (carwindow1) and year 2014?

Here is my other dummies that I have been using before: (only the dummy part )

data a;

set ev.final2CarAn;

cs=.;

if ConsiStr='SHARES' then CS=1;

if ConsiStr='CASHO' then CS=2;

if ConsiStr='HYBRID' then CS=3;

if ConsiStr='OTHER' then CS=4;

ratio=.;

if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;

proc sort;

by type;

run;

data ev.b;

set a;

h=0;

v=0;

c=0;

if type=1 then h=1;

if type=2 then v=1;

if type=3 then c=1;

ps=0;

pk=0;

ph=0;

po=0;

if cs=1 then ps=1;

if cs=2 then pk=1;

if cs=3 then ph=1;

if cs=4 then po=1;

la=0;

li=0;

if local=1 then la=1;

if local=0 then li=1;

lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

run;

And then I used the following statements to run regressions on each variable and my main variable (carwindow1):

proc reg data=ev.b outest= estb tableout ADJRSQ;

model CARwindow1= h ;

run;

data est2 (keep= _type_ _ADJRSQ_ _RMSE_ intercept h v c ps pk ph po la li lnTMV lnTEq lnTEn ratio AEQW);

set estb;

run;

data _null_;

proc datasets force nolist;

append base=ev.result1

data=est2;

run;

Thank you,

Niloo

Accepted Solutions

Solution

12-13-2015
11:30 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-10-2015 11:27 PM

If you absolutely need to use **proc reg** then you could save some work by running **proc glmmod** to create the dummy variables. It would look like this:

```
data a;
set ev.final2CarAn;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
year = year(Andate);
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type local year;
model carwindow1 = ratio lnTEq lnTEn lnTMV lnAMV ConsiStr type local year / noint;
run;
/* Check the meaning of col1, col2, etc. variables in the new dataset */
proc print data=parm; run;
/* If you want, rename the variables created by GLMMOD */
data c;
set b;
rename col1=ratio col2=lnTEq ....;
run;
proc reg data=c;
model carwindow1 = ....;
run;
```

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-10-2015 10:38 PM

You need N-1 indicators. You actually have 15 years so you need 14 indicator variables, the one you leave out is your reference level - what you compare things to.

Your sample code doesn't correctly account for the differing levels-rationally shares vs other having a difference of 3 doesn't make sense.

You may find it helpful to look at PROC GLM and the CLASS statement, noting the reference level options.

Your sample code doesn't correctly account for the differing levels-rationally shares vs other having a difference of 3 doesn't make sense.

You may find it helpful to look at PROC GLM and the CLASS statement, noting the reference level options.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

12-10-2015 10:46 PM

Hi Reeza,

Thank you for your response. I do not understand what you mean by having a difference of 3.

What about my older statements? are they wron too?

for type of payment( cash, stock, hybrid, other) we have 4 dummy variable:

ps=0;

pk=0;

ph=0;

po=0;

if cs=1 then ps=1;

if cs=2 then pk=1;

if cs=3 then ph=1;

if cs=4 then po=1;

Thanks,

Niloo

Solution

12-13-2015
11:30 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-10-2015 11:27 PM

If you absolutely need to use **proc reg** then you could save some work by running **proc glmmod** to create the dummy variables. It would look like this:

```
data a;
set ev.final2CarAn;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
year = year(Andate);
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type local year;
model carwindow1 = ratio lnTEq lnTEn lnTMV lnAMV ConsiStr type local year / noint;
run;
/* Check the meaning of col1, col2, etc. variables in the new dataset */
proc print data=parm; run;
/* If you want, rename the variables created by GLMMOD */
data c;
set b;
rename col1=ratio col2=lnTEq ....;
run;
proc reg data=c;
model carwindow1 = ....;
run;
```

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-11-2015 08:42 AM

PROC GLMMOD should be more widely known. Very useful procedure!

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

12-11-2015 11:14 AM

Yes, I just wish it could generate more meaningful variable names, now that the limit has been increased to 32 characters.

PG

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PGStats

12-13-2015 11:54 PM

Hi PGStats,

Thank you very much for your solution. It is much more efficient than my previous statements. I revised my program and have 3 question:

1) Shall I add the variables that I want to include in the regression in this part?

proc glmmod data=a outdesign=b outparm=parm;

class ConsiStr type year;

model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year (other variables) / noint;

run;

2) I have 15 years of data (2000-2014) and 15 dummy variables. Isn't this going to cause a multicolinearity problem?

3) If I want to creat another set of dummy variables for industry can I add this statement:

data a;

set ev.final2;

if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

lnSIZE=log(HOSTATASS);

year = year(Andate);

Industry= TMaCode;

run;

proc glmmod data=a outdesign=b outparm=parm;

class ConsiStr type year Industry;

Here is how my statemets look like now:

libname ev 'C:\Regression';

run;

***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;

data final2;

set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ Debtratio HOSTATASS AMVal4wp TMVal4wp EqValAn EnValAn ConsiStr TNation type AEQW);

proc sort;

by adscd andate;

run;

data a;

set ev.final2;

if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

lnSIZE=log(HOSTATASS);

year = year(Andate);

run;

proc glmmod data=a outdesign=b outparm=parm;

class ConsiStr type year;

model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year / noint;

run;

proc print data=parm; run;

data c;

set b;

rename col1=ratio col2=lnTEq col3=lnTEn col4=lnTMV col5=lnAMV col6=lnSIZE col7=cash col8=hybrid col9=other col10=shares

col11=unknown col12=horizental col13=vertical col14=conglomerate col15=year0 col16=year1 col17=year2 col18=year3 col19=year4

col20=year5 col21=year6 col22=year7 col24=year9 col25=year10 col26=year11 col27=year12 col28=year13 col29=year14;

run;

proc reg data=c;

model Acar1 = lnSIZE;

run;

Thank you very much,

Niloo

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-14-2015 02:42 PM - edited 12-14-2015 02:43 PM

niloo wrote:

Hi PGStats,

Thank you very much for your solution. It is much more efficient than my previous statements. I revised my program and have 3 question:

1) Shall I add the variables that I want to include in the regression in this part?

proc glmmod data=a outdesign=b outparm=parm;

class ConsiStr type year;

model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year (other variables) / noint;

run;

Yes, if you want the other variables in the regression. However, this seems needlessly complicated, you are planning to run PROC REG so you need DUMMY variables; but if you run the analysis in PROC GLM, you don't have to create the dummy variables yourself, PROC GLM does that for you internally.

2) I have 15 years of data (2000-2014) and 15 dummy variables. Isn't this going to cause a multicolinearity problem?

Again, this is all taken care of internally in PROC GLM, so why use PROC REG? Anyway, the NOINT option eliminates the multicollineary issue.

3) If I want to creat another set of dummy variables for industry can I add this statement:

data a;

set ev.final2;if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

lnSIZE=log(HOSTATASS);

year = year(Andate);Industry= TMaCode;

run;

proc glmmod data=a outdesign=b outparm=parm;

class ConsiStr type year Industry;

Here is how my statemets look like now:

libname ev 'C:\Regression';

run;***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;

data final2;

set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ Debtratio HOSTATASS AMVal4wp TMVal4wp EqValAn EnValAn ConsiStr TNation type AEQW);

proc sort;

by adscd andate;

run;

data a;

set ev.final2;if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

lnSIZE=log(HOSTATASS);

year = year(Andate);

run;

proc glmmod data=a outdesign=b outparm=parm;

class ConsiStr type year;

model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year / noint;

run;

proc print data=parm; run;data c;

set b;

rename col1=ratio col2=lnTEq col3=lnTEn col4=lnTMV col5=lnAMV col6=lnSIZE col7=cash col8=hybrid col9=other col10=shares

col11=unknown col12=horizental col13=vertical col14=conglomerate col15=year0 col16=year1 col17=year2 col18=year3 col19=year4

col20=year5 col21=year6 col22=year7 col24=year9 col25=year10 col26=year11 col27=year12 col28=year13 col29=year14;

run;proc reg data=c;

model Acar1 = lnSIZE;

run;

PROC GLM, PROC GLM, PROC GLM! Stop going through all this effort to create dummy variables.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

12-15-2015 12:43 AM

Dear Paige Miller,

Thank you for your response. So I will not use a PROC REG. But my PROC GLM does not lead to an output page with regression results. I need to run a regression on my dependent variable (ACARs1) and each on my dependent variable and then in the next level on my dependent variable and combinations of my independent variables. Only when I delete the outdesign and outparm and limit the independent variables to one I get the output window with regression results. How can I fix my statements?

Here is how my statements look like:

libname ev 'C:\Regression';

run;

***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;

data final2;

set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ ADebtratio DEBTRT HOSTATASS AMVal4wp TMVal4wp

EqValAn EnValAn ConsiStr TNation type AEQW);

proc sort;

by adscd andate;

run;

data a;

set ev.final2;

if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

lnSIZE=log(HOSTATASS);

year = year(Andate);

Industry= TMaCode;

run;

/*this does not give me an output window*/

proc glmmod data=a outdesign=b outparm=parm ;

class ConsiStr type year Industry;

model Acar1 = ratio lnTEn lnSIZE ConsiStr type year ADebtratio TobinQ DEBTRT Industry AEQW / noint;

run;

proc print data=parm; run;

/*this one gives me an output window*/

proc glm data=a ;

class year;

model Acar1 = year / noint;

run;

Thank you,

Niloo

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-15-2015 08:06 AM - edited 12-15-2015 08:12 AM

But my PROC GLM does not lead to an output page with regression results.

Sure it does. PROC GLM performs a regression on your data. (And you need a QUIT; statement at the end of PROC GLM)

The results are written to the output window (unless you have somehow turned that off).

And then you can take all the PROC GLMMOD statements out of your code.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

12-16-2015 05:30 PM

Dear PaigeMiller,

Thank you for your response. I chaned my statements to following and I am trying to export the results into an excel sheet but the ODS statement that I have exports each table to a seperate sheet. how can I have al the results in one sheet and not in table format?

data a;

set ev.final2;

if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

lnSIZE=log(HOSTATASS);

year = year(Andate);

Industry= TMaCode;

run;

ODS TAGSETS.EXCELXP

file='C:\Users\esy\Desktop\niloofar\DePaul Research\M&AJAPAN\Majority\Results\regression.xlsx'

STYLE=minimal

OPTIONS ( Orientation = 'landscape'

FitToPage = 'yes'

Pages_FitWidth = '1'

Pages_FitHeight = '100' );

proc glm data=a;

class ConsiStr type year Industry;

model Acar1 = year / noint;

QUIT;

run;

proc glm data=a;

class ConsiStr type year Industry;

model Acar1 = ratio / noint;

QUIT;

run;

proc glm data=a;

class ConsiStr type year Industry;

model Acar1 = TobinQ/ noint;

QUIT;

run;

(I will run the same statment for all my variavles individually, and then for a combination of variables which had meaningful relationships with the dependent variable):

proc glm data=a;

class ConsiStr type year Industry;

model Acar1 = ADebtratio ratio / noint;

QUIT;

run;

Thank you,

Niloo

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-16-2015 05:32 PM

Look at the sheet_interval option - the value you're looking for is none.

I don't know what you mean by not in table format.

I don't know what you mean by not in table format.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-16-2015 05:34 PM

PS. I think your original question has been answered. You should mark this question as answered and start new ones if you have further questions.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

12-17-2015 12:50 PM

Hi again,

I have a question about the dummy variables created by Proc GLM. Variable "type" showes the type of payment (Cash, stock, Hybrid, Unknown) and variable "ConsiStr" discribes the type of deal(horizental, vertical, conglomerate). I need to run a regression on my dependent variable (ACAR1) and "cash" type of payment (not the whole "type" variable) and on ACAR1 and "conglomerate" dales( not the ConsiStr variable itself. But I cannot do it using my current PROC GLM statements. Is there a way I can do this using Proc GLM?

Here is what I currently have:

libname ev 'C:\Regression';

run;

data final2;

set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ ADebtratio DEBTRT HOSTATASS AMVal4wp TMVal4wp

EqValAn EnValAn ConsiStr TNation type AEQW);

proc sort;

by adscd andate;

run;

data a;

set ev.final2;

if amval4wp ne 0 and amval4wp ne . then

ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);

lnTEn=log(EnValAn);

lnTMV=log(TMVal4wp);

lnAMV=log(amval4wp);

lnSIZE=log(HOSTATASS);

year = year(Andate);

Industry= AMaCode;

run;

proc glm data=a;

class ConsiStr type year Industry;

model Acar1 = year / noint;

QUIT;

run;

proc glm data=a;

class ConsiStr type year Industry;

model Acar1 = Industry / noint;

QUIT;

run;

proc glm data=a;

class ConsiStr type year Industry;

model Acar1 = AEQW/ noint;

QUIT;

run;

.... and the rest of the variables

ODS TAGSETS.EXCELXP

file='C:\Users\esy\Desktop\niloofar\DePaul Research\M&AJAPAN\Majority\Results\regression.xls'

STYLE=minimal

OPTIONS ( Orientation = 'landscape'

FitToPage = 'yes'

Pages_FitWidth = '1'

Pages_FitHeight = '100' sheet_interval='NONE' );

Thanks,

Niloo

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to niloo

12-17-2015 01:55 PM

You can restrict your regression analysis to a subset of cases with the **where** statement

```
proc glm data=a;
where type="cash";
class ConsiStr year Industry;
model Acar1 = year / noint;
run;
QUIT;
```

Note: the **quit** statement goes after the **run** statement.

PG