Hi everyone,
I have a sas program that i have been using and now I need to add dummy variables for years and industry to my variables and then use these dummy variables in regressions. So I was thinking of first adding a column called year:
year=year(Andate);
run;
and then the dummy variables :
zero=0;
one=0;
two=0;
three=0;
four=0;
five=0;
six=0;
seven=0;
eight=0;
nine=0;
ten=0;
eleven=0;
twelve=0;
thirteen=0;
fourteen=0;
if year=2000 then zero=1;
if year=2001 then one=1;
if year=2002 then two=1;
if year=2003 then three=1;
if year=2004 then four=1;
if year=2005 then five=1;
if year=2006 then six=1;
if year=2007 then seven=1;
if year=2008 then eight=1;
if year=2009 then nine=1;
if year=2010 then ten=1;
if year=2011 then eleven=1;
if year=2012 then twelve=1;
if year=2013 then thirteen=1;
if year=2014 then fourteen=1;
Am I correct?Also my professor insists that if the number of dummies is equal to the number of treatments, the regression will suffer from exact multicolinearity, so I have to have 13 dummies for year. If I only create 13 dummies for year (for example from 2000 to 2013) how can I run a regression on my dependant variable (carwindow1) and year 2014?
Here is my other dummies that I have been using before: (only the dummy part )
data a;
set ev.final2CarAn;
cs=.;
if ConsiStr='SHARES' then CS=1;
if ConsiStr='CASHO' then CS=2;
if ConsiStr='HYBRID' then CS=3;
if ConsiStr='OTHER' then CS=4;
ratio=.;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
proc sort;
by type;
run;
data ev.b;
set a;
h=0;
v=0;
c=0;
if type=1 then h=1;
if type=2 then v=1;
if type=3 then c=1;
ps=0;
pk=0;
ph=0;
po=0;
if cs=1 then ps=1;
if cs=2 then pk=1;
if cs=3 then ph=1;
if cs=4 then po=1;
la=0;
li=0;
if local=1 then la=1;
if local=0 then li=1;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
run;
And then I used the following statements to run regressions on each variable and my main variable (carwindow1):
proc reg data=ev.b outest= estb tableout ADJRSQ;
model CARwindow1= h ;
run;
data est2 (keep= _type_ _ADJRSQ_ _RMSE_ intercept h v c ps pk ph po la li lnTMV lnTEq lnTEn ratio AEQW);
set estb;
run;
data _null_;
proc datasets force nolist;
append base=ev.result1
data=est2;
run;
Thank you,
Niloo
If you absolutely need to use proc reg then you could save some work by running proc glmmod to create the dummy variables. It would look like this:
data a;
set ev.final2CarAn;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
year = year(Andate);
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type local year;
model carwindow1 = ratio lnTEq lnTEn lnTMV lnAMV ConsiStr type local year / noint;
run;
/* Check the meaning of col1, col2, etc. variables in the new dataset */
proc print data=parm; run;
/* If you want, rename the variables created by GLMMOD */
data c;
set b;
rename col1=ratio col2=lnTEq ....;
run;
proc reg data=c;
model carwindow1 = ....;
run;
Hi Reeza,
Thank you for your response. I do not understand what you mean by having a difference of 3.
What about my older statements? are they wron too?
Thanks,
Niloo
If you absolutely need to use proc reg then you could save some work by running proc glmmod to create the dummy variables. It would look like this:
data a;
set ev.final2CarAn;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
year = year(Andate);
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type local year;
model carwindow1 = ratio lnTEq lnTEn lnTMV lnAMV ConsiStr type local year / noint;
run;
/* Check the meaning of col1, col2, etc. variables in the new dataset */
proc print data=parm; run;
/* If you want, rename the variables created by GLMMOD */
data c;
set b;
rename col1=ratio col2=lnTEq ....;
run;
proc reg data=c;
model carwindow1 = ....;
run;
PROC GLMMOD should be more widely known. Very useful procedure!
Yes, I just wish it could generate more meaningful variable names, now that the limit has been increased to 32 characters.
Hi PGStats,
Thank you very much for your solution. It is much more efficient than my previous statements. I revised my program and have 3 question:
1) Shall I add the variables that I want to include in the regression in this part?
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year (other variables) / noint;
run;
2) I have 15 years of data (2000-2014) and 15 dummy variables. Isn't this going to cause a multicolinearity problem?
3) If I want to creat another set of dummy variables for industry can I add this statement:
data a;
set ev.final2;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
Industry= TMaCode;
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year Industry;
Here is how my statemets look like now:
libname ev 'C:\Regression';
run;
***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;
data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ Debtratio HOSTATASS AMVal4wp TMVal4wp EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;
data a;
set ev.final2;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year / noint;
run;
proc print data=parm; run;
data c;
set b;
rename col1=ratio col2=lnTEq col3=lnTEn col4=lnTMV col5=lnAMV col6=lnSIZE col7=cash col8=hybrid col9=other col10=shares
col11=unknown col12=horizental col13=vertical col14=conglomerate col15=year0 col16=year1 col17=year2 col18=year3 col19=year4
col20=year5 col21=year6 col22=year7 col24=year9 col25=year10 col26=year11 col27=year12 col28=year13 col29=year14;
run;
proc reg data=c;
model Acar1 = lnSIZE;
run;
Thank you very much,
Niloo
@niloo wrote:
Hi PGStats,
Thank you very much for your solution. It is much more efficient than my previous statements. I revised my program and have 3 question:
1) Shall I add the variables that I want to include in the regression in this part?
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year (other variables) / noint;
run;
Yes, if you want the other variables in the regression. However, this seems needlessly complicated, you are planning to run PROC REG so you need DUMMY variables; but if you run the analysis in PROC GLM, you don't have to create the dummy variables yourself, PROC GLM does that for you internally.
2) I have 15 years of data (2000-2014) and 15 dummy variables. Isn't this going to cause a multicolinearity problem?
Again, this is all taken care of internally in PROC GLM, so why use PROC REG? Anyway, the NOINT option eliminates the multicollineary issue.
3) If I want to creat another set of dummy variables for industry can I add this statement:
data a;
set ev.final2;if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);Industry= TMaCode;
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year Industry;
Here is how my statemets look like now:
libname ev 'C:\Regression';
run;***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;
data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ Debtratio HOSTATASS AMVal4wp TMVal4wp EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;
data a;
set ev.final2;if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
run;
proc glmmod data=a outdesign=b outparm=parm;
class ConsiStr type year;
model Acar1 = ratio lnTEq lnTEn lnTMV lnAMV lnSIZE ConsiStr type year / noint;
run;
proc print data=parm; run;data c;
set b;
rename col1=ratio col2=lnTEq col3=lnTEn col4=lnTMV col5=lnAMV col6=lnSIZE col7=cash col8=hybrid col9=other col10=shares
col11=unknown col12=horizental col13=vertical col14=conglomerate col15=year0 col16=year1 col17=year2 col18=year3 col19=year4
col20=year5 col21=year6 col22=year7 col24=year9 col25=year10 col26=year11 col27=year12 col28=year13 col29=year14;
run;proc reg data=c;
model Acar1 = lnSIZE;
run;
PROC GLM, PROC GLM, PROC GLM! Stop going through all this effort to create dummy variables.
Dear Paige Miller,
Thank you for your response. So I will not use a PROC REG. But my PROC GLM does not lead to an output page with regression results. I need to run a regression on my dependent variable (ACARs1) and each on my dependent variable and then in the next level on my dependent variable and combinations of my independent variables. Only when I delete the outdesign and outparm and limit the independent variables to one I get the output window with regression results. How can I fix my statements?
Here is how my statements look like:
libname ev 'C:\Regression';
run;
***Majority All data set- Regression for Acquirors, CAR (-1, +1), announcment date ********/;
data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ ADebtratio DEBTRT HOSTATASS AMVal4wp TMVal4wp
EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;
data a;
set ev.final2;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
Industry= TMaCode;
run;
/*this does not give me an output window*/
proc glmmod data=a outdesign=b outparm=parm ;
class ConsiStr type year Industry;
model Acar1 = ratio lnTEn lnSIZE ConsiStr type year ADebtratio TobinQ DEBTRT Industry AEQW / noint;
run;
proc print data=parm; run;
/*this one gives me an output window*/
proc glm data=a ;
class year;
model Acar1 = year / noint;
run;
Thank you,
Niloo
But my PROC GLM does not lead to an output page with regression results.
Sure it does. PROC GLM performs a regression on your data. (And you need a QUIT; statement at the end of PROC GLM)
The results are written to the output window (unless you have somehow turned that off).
And then you can take all the PROC GLMMOD statements out of your code.
Dear PaigeMiller,
Thank you for your response. I chaned my statements to following and I am trying to export the results into an excel sheet but the ODS statement that I have exports each table to a seperate sheet. how can I have al the results in one sheet and not in table format?
data a;
set ev.final2;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
Industry= TMaCode;
run;
ODS TAGSETS.EXCELXP
file='C:\Users\esy\Desktop\niloofar\DePaul Research\M&AJAPAN\Majority\Results\regression.xlsx'
STYLE=minimal
OPTIONS ( Orientation = 'landscape'
FitToPage = 'yes'
Pages_FitWidth = '1'
Pages_FitHeight = '100' );
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = year / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = ratio / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = TobinQ/ noint;
QUIT;
run;
(I will run the same statment for all my variavles individually, and then for a combination of variables which had meaningful relationships with the dependent variable):
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = ADebtratio ratio / noint;
QUIT;
run;
Thank you,
Niloo
Hi again,
I have a question about the dummy variables created by Proc GLM. Variable "type" showes the type of payment (Cash, stock, Hybrid, Unknown) and variable "ConsiStr" discribes the type of deal(horizental, vertical, conglomerate). I need to run a regression on my dependent variable (ACAR1) and "cash" type of payment (not the whole "type" variable) and on ACAR1 and "conglomerate" dales( not the ConsiStr variable itself. But I cannot do it using my current PROC GLM statements. Is there a way I can do this using Proc GLM?
Here is what I currently have:
libname ev 'C:\Regression';
run;
data final2;
set ev.final2(keep= tdscd ADSCD Aname Tname AnDate Wdate Acar1 Acar5 Tcar1 Tcar5 TobinQ ADebtratio DEBTRT HOSTATASS AMVal4wp TMVal4wp
EqValAn EnValAn ConsiStr TNation type AEQW);
proc sort;
by adscd andate;
run;
data a;
set ev.final2;
if amval4wp ne 0 and amval4wp ne . then
ratio=tmval4wp/amval4wp;
lnTEq=log(EqValAn);
lnTEn=log(EnValAn);
lnTMV=log(TMVal4wp);
lnAMV=log(amval4wp);
lnSIZE=log(HOSTATASS);
year = year(Andate);
Industry= AMaCode;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = year / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = Industry / noint;
QUIT;
run;
proc glm data=a;
class ConsiStr type year Industry;
model Acar1 = AEQW/ noint;
QUIT;
run;
.... and the rest of the variables
ODS TAGSETS.EXCELXP
file='C:\Users\esy\Desktop\niloofar\DePaul Research\M&AJAPAN\Majority\Results\regression.xls'
STYLE=minimal
OPTIONS ( Orientation = 'landscape'
FitToPage = 'yes'
Pages_FitWidth = '1'
Pages_FitHeight = '100' sheet_interval='NONE' );
Thanks,
Niloo
You can restrict your regression analysis to a subset of cases with the where statement
proc glm data=a;
where type="cash";
class ConsiStr year Industry;
model Acar1 = year / noint;
run;
QUIT;
Note: the quit statement goes after the run statement.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.