DATA Step, Macro, Functions and more

Dummy variables for years

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

Dummy variables for years

 Hi everyone,

 

I have a sas program that i have been using and now I need to add  dummy variables for years and industry to my variables and then use these dummy variables in regressions. So I was thinking of first adding a column called year:

year=year(Andate);
run;

and then the dummy variables :

zero=0;
one=0;
two=0;
three=0;
four=0;
five=0;
six=0;
seven=0;
eight=0;
nine=0;
ten=0;
eleven=0;
twelve=0;
thirteen=0;
fourteen=0;
if year=2000 then zero=1;
if year=2001 then one=1;
if year=2002 then two=1;
if year=2003 then three=1;
if year=2004 then four=1;
if year=2005 then five=1;
if year=2006 then six=1;
if year=2007 then seven=1;
if year=2008 then eight=1;
if year=2009 then nine=1;
if year=2010 then ten=1;
if year=2011 then eleven=1;
if year=2012 then twelve=1;
if year=2013 then thirteen=1;
if year=2014 then fourteen=1;

 

Am I correct?Also my professor insists that  if the number of dummies is equal to the number of treatments, the regression will suffer from exact multicolinearity, so I have to have 13 dummies for year. If I only create 13 dummies for year (for example from 2000 to 2013) how can I run a regression on my dependant variable (carwindow1) and year 2014?

 

Here is my other dummies that I have been using before: (only the dummy part )


data a;
set ev.final2CarAn;
cs=.;
if ConsiStr='SHARES' then CS=1;
if ConsiStr='CASHO' then CS=2;
if ConsiStr='HYBRID' then CS=3;
if ConsiStr='OTHER' then CS=4;


ratio=.;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
proc sort;
by type;
run;

data ev.b;
set a;

h=0;
v=0;
c=0;

if type=1 then h=1;
if type=2 then v=1;
if type=3 then c=1;


ps=0;
pk=0;
ph=0;
po=0;

if cs=1 then ps=1;
if cs=2 then pk=1;
if cs=3 then ph=1;
if cs=4 then po=1;

la=0;
li=0;
if local=1 then la=1;
if local=0 then li=1;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);

run;

And then I used the following statements to run regressions on each variable and my main variable (carwindow1):

proc reg data=ev.b outest= estb tableout ADJRSQ;
model CARwindow1= h ;
run;
data est2 (keep= _type_ _ADJRSQ_ _RMSE_ intercept h v c ps pk ph po la li lnTMV lnTEq lnTEn ratio AEQW);
set estb;
run;
data _null_;
proc datasets force nolist;
append base=ev.result1
data=est2;
run;

 

Thank you,

Niloo


Accepted Solutions
Solution
‎12-13-2015 11:30 PM
Respected Advisor
Posts: 4,646

Re: Dummy variables for years

If you absolutely need to use proc reg then you could save some work by running proc glmmod to create the dummy variables. It would look like this:

 

data a;
set ev.final2CarAn;
if amval4wp ne 0 and amval4wp ne . then
	ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
year = year(Andate);
run;

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type local year;
model carwindow1 = ratio lnTEq lnTEn lnTMV lnAMV ConsiStr type local year / noint;
run;

/* Check the meaning of col1, col2, etc. variables in the new dataset */
proc print data=parm; run;

/* If you want, rename the variables created by GLMMOD */
data c;
set b;
rename col1=ratio col2=lnTEq ....;
run;

proc reg data=c;
model carwindow1 = ....;
run;
PG

View solution in original post


All Replies
Super User
Posts: 17,824

Re: Dummy variables for years

You need N-1 indicators. You actually have 15 years so you need 14 indicator variables, the one you leave out is your reference level - what you compare things to.

Your sample code doesn't correctly account for the differing levels-rationally shares vs other having a difference of 3 doesn't make sense.

You may find it helpful to look at PROC GLM and the CLASS statement, noting the reference level options.
Contributor
Posts: 51

Re: Dummy variables for years

Hi Reeza,

 

Thank you for your response. I do not understand what you mean by having a difference of 3.

What about my older statements? are they wron too?

for type of payment( cash, stock, hybrid, other) we have 4 dummy variable:
 
ps=0;
pk=0;
ph=0;
po=0;
 
if cs=1 then ps=1;
if cs=2 then pk=1;
if cs=3 then ph=1;
if cs=4 then po=1;

 

Thanks,

Niloo

Solution
‎12-13-2015 11:30 PM
Respected Advisor
Posts: 4,646

Re: Dummy variables for years

If you absolutely need to use proc reg then you could save some work by running proc glmmod to create the dummy variables. It would look like this:

 

data a;
set ev.final2CarAn;
if amval4wp ne 0 and amval4wp ne . then
	ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
year = year(Andate);
run;

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type local year;
model carwindow1 = ratio lnTEq lnTEn lnTMV lnAMV ConsiStr type local year / noint;
run;

/* Check the meaning of col1, col2, etc. variables in the new dataset */
proc print data=parm; run;

/* If you want, rename the variables created by GLMMOD */
data c;
set b;
rename col1=ratio col2=lnTEq ....;
run;

proc reg data=c;
model carwindow1 = ....;
run;
PG
Trusted Advisor
Posts: 1,614

Re: Dummy variables for years

PROC GLMMOD should be more widely known. Very useful procedure!

Respected Advisor
Posts: 4,646

Re: Dummy variables for years

Yes, I just wish it could generate more meaningful variable names, now that the limit has been increased to 32 characters.

PG
Contributor
Posts: 51

Re: Dummy variables for years

Hi PGStats,

 

Thank you very much for your solution. It is much more efficient than my previous statements. I revised my program and have 3 question:

1) Shall I add the variables that I want to include in the regression in this part?

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year (other variables) / noint;
run;

2) I have 15 years of data (2000-2014) and 15 dummy variables. Isn't this going to cause a multicolinearity problem?

3) If I want to creat another set of dummy variables for industry can I add this statement:

data a;
set ev.final2;

if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);

Industry= TMaCode;
run;

 

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year Industry;

 

Here is how my statemets look like now:

 

libname ev 'C:\Regression';
run;

***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;


data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ Debtratio HOSTATASS AMVal4wp TMVal4wp EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;

 

data a;
set ev.final2;

if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
run;

 

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year / noint;
run;
proc print data=parm; run;

data c;
set b;
rename col1=ratio col2=lnTEq col3=lnTEn col4=lnTMV col5=lnAMV col6=lnSIZE col7=cash col8=hybrid col9=other col10=shares
col11=unknown col12=horizental col13=vertical col14=conglomerate col15=year0 col16=year1 col17=year2 col18=year3 col19=year4
col20=year5 col21=year6 col22=year7 col24=year9 col25=year10 col26=year11 col27=year12 col28=year13 col29=year14;
run;

proc reg data=c;
model Acar1 = lnSIZE;
run;

 

Thank you very much,

Niloo

Trusted Advisor
Posts: 1,614

Re: Dummy variables for years

[ Edited ]

niloo wrote:

Hi PGStats,

 

Thank you very much for your solution. It is much more efficient than my previous statements. I revised my program and have 3 question:

1) Shall I add the variables that I want to include in the regression in this part?

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year (other variables) / noint;
run;

 

Yes, if you want the other variables in the regression. However, this seems needlessly complicated, you are planning to run PROC REG so you need DUMMY variables; but if you run the analysis in PROC GLM, you don't have to create the dummy variables yourself, PROC GLM does that for you internally.

 

2) I have 15 years of data (2000-2014) and 15 dummy variables. Isn't this going to cause a multicolinearity problem?

 

Again, this is all taken care of internally in PROC GLM, so why use PROC REG? Anyway, the NOINT option eliminates the multicollineary issue.

 

3) If I want to creat another set of dummy variables for industry can I add this statement:

data a;
set ev.final2;

if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);

Industry= TMaCode;
run;

 

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year Industry;

 

Here is how my statemets look like now:

 

libname ev 'C:\Regression';
run;

***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;


data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ Debtratio HOSTATASS AMVal4wp TMVal4wp EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;

 

data a;
set ev.final2;

if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
run;

 

proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year / noint;
run;
proc print data=parm; run;

data c;
set b;
rename col1=ratio col2=lnTEq col3=lnTEn col4=lnTMV col5=lnAMV col6=lnSIZE col7=cash col8=hybrid col9=other col10=shares
col11=unknown col12=horizental col13=vertical col14=conglomerate col15=year0 col16=year1 col17=year2 col18=year3 col19=year4
col20=year5 col21=year6 col22=year7 col24=year9 col25=year10 col26=year11 col27=year12 col28=year13 col29=year14;
run;

proc reg data=c;
model Acar1 = lnSIZE;
run;

 

 

PROC GLM, PROC GLM, PROC GLM! Stop going through all this effort to create dummy variables.

Contributor
Posts: 51

Re: Dummy variables for years

Dear Paige Miller,

 

Thank you for your response. So I will not use a PROC REG. But my PROC GLM does not lead to an output page with regression results. I need to run a regression on my dependent variable (ACARs1) and each on my dependent variable and then in the next level on my dependent variable and  combinations of my independent variables. Only when I delete the outdesign and outparm and limit the independent variables to one I get the output window with regression results. How can I fix my statements?

Here is how my statements look like:

libname ev 'C:\Regression';
run;

***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;
data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ ADebtratio DEBTRT HOSTATASS AMVal4wp TMVal4wp
EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;

 

data a;
set ev.final2;

if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
Industry= TMaCode;
run;

/*this does not give me an output window*/

 

proc glmmod data=a outdesign=b outparm=parm ;
class ConsiStr type year Industry;
model Acar1 = ratio lnTEn lnSIZE ConsiStr type year ADebtratio TobinQ DEBTRT Industry AEQW / noint;
run;
proc print data=parm; run;

/*this one gives me an output window*/

proc glm data=a ;
class year;
model Acar1 = year / noint;
run;

 

Thank you,

Niloo

Trusted Advisor
Posts: 1,614

Re: Dummy variables for years

[ Edited ]

 But my PROC GLM does not lead to an output page with regression results.

Sure it does. PROC GLM performs a regression on your data. (And you need a QUIT; statement at the end of PROC GLM)

 

The results are written to the output window (unless you have somehow turned that off).

 

And then you can take all the PROC GLMMOD statements out of your code.

 

 

Contributor
Posts: 51

Re: Dummy variables for years

Dear PaigeMiller,

 

Thank you for your response. I chaned my statements to following and I am trying to export the results into an excel sheet but the ODS statement that I have exports each table to a seperate sheet. how can I have al the results in one sheet and not in table format?

 

data a;
set ev.final2;

if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
Industry= TMaCode;
run;

ODS TAGSETS.EXCELXP
file='C:\Users\esy\Desktop\niloofar\DePaul Research\M&AJAPAN\Majority\Results\regression.xlsx'
STYLE=minimal
OPTIONS ( Orientation = 'landscape'
FitToPage = 'yes'
Pages_FitWidth = '1'
Pages_FitHeight = '100' );

proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = year / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = ratio / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = TobinQ/ noint;
QUIT;
run;
(I will run the same statment for all my variavles individually, and then for a combination of variables which had meaningful relationships with the dependent variable):
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = ADebtratio ratio / noint;
QUIT;
run;

 

Thank you,

Niloo

Super User
Posts: 17,824

Re: Dummy variables for years

Look at the sheet_interval option - the value you're looking for is none.

I don't know what you mean by not in table format.
Super User
Posts: 17,824

Re: Dummy variables for years

PS. I think your original question has been answered. You should mark this question as answered and start new ones if you have further questions.
Contributor
Posts: 51

Re: Dummy variables for year

Hi again,

 

I have a question about the dummy variables created by Proc GLM. Variable "type" showes the type of payment (Cash, stock, Hybrid, Unknown) and variable "ConsiStr" discribes the type of deal(horizental, vertical, conglomerate). I need to run a regression on my dependent variable (ACAR1) and "cash" type of payment (not the whole "type" variable) and on ACAR1 and "conglomerate" dales( not the ConsiStr variable itself. But I cannot do it using my current PROC GLM statements. Is there a way I can do this using Proc GLM?

Here is what I currently have:

libname ev 'C:\Regression';
run;

data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ ADebtratio DEBTRT HOSTATASS AMVal4wp TMVal4wp
EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;

 

data a;
set ev.final2;

if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;

lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
Industry= AMaCode;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = year / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = Industry / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = AEQW/ noint;
QUIT;
run;

.... and the rest of the variables

ODS TAGSETS.EXCELXP
file='C:\Users\esy\Desktop\niloofar\DePaul Research\M&AJAPAN\Majority\Results\regression.xls'
STYLE=minimal
OPTIONS ( Orientation = 'landscape'
FitToPage = 'yes'
Pages_FitWidth = '1'
Pages_FitHeight = '100' sheet_interval='NONE' );

 

Thanks,

Niloo

Respected Advisor
Posts: 4,646

Re: Dummy variables for year

You can restrict your regression analysis to a subset of cases with the where statement

 

proc glm data=a;
where type="cash";
class ConsiStr year Industry;
model Acar1 = year / noint;
run;
QUIT;

Note: the quit statement goes after the run statement.

PG
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 24 replies
  • 684 views
  • 7 likes
  • 4 in conversation