Hi I am new to sas and currently wrote some very simple code to do probit/lpogit model with the data I downloaded. I did the 11 years (2006 to 2016) one by one to get the result, which is very time consuming. So I am wondering is there any way to repeat the steps from import file til the end by code? I always feel my code is pretty redundant so any suggestion for simplifying would be appreciated. Thank you!
Here is my code:
PROC IMPORT OUT= work.Ebsd
DATAFILE= "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Original_Div\Div_2006.csv"
DBMS=CSV REPLACE;*signify currency, year and month. if cite a word as the name, should be end up with '.', check ¤cy.,&year.,&i.;
GETNAMES=YES;
DATAROW=2;
RUN;
data work.EBSd;
set work.Ebsd;
rename at=TotalAsset;
rename bkvlps=BookValue;
rename capx=CapitalExpenditure;
rename dpc=Depreciation;
rename dt=TotalDebt;
rename dv=Dividend;
rename ebit=EBIT;
rename gp=GrossProfit;
rename ni=NetIncome;
rename seq=ShareholderEquity;
rename txt=IncomeTax;
rename wcapch=WorkingCapitalChange;
rename xint=InterestExpense;
rename xopr=OperatingExpense;
drop costat tie dp curcd tic datafmt popsrc consol indfmt fyear datadate gvkey;
run;
data work.EBSd;
set work.Ebsd;
if TotalAsset=. then delete;
if BookValue=. then BookValue=0;
if CapitalExpenditure=. then CapitalExpenditure=0;
if Depreciation=. then Depreciation=0;
if TotalDebt=. then TotalDebt=0;
if EBIT=. then EBIT=0;
if GrossProfit=. then GrossProfit=0;
if NetIncome=. then NetIncome=0;
if OperatingExpense=. then OperatingExpense=0;
if InterestExpense=. then InterestExpense=0;
if WorkingCapitalChange=. then WorkingCapitalChange=0;
if Dividend=0 then Dividend=0;
if Dividend>0 then Dividend=1;
if Dividend=. then Dividend=0;
run;
data work.EBSd;
set work.Ebsd;
CFO= EBIT + Depreciation - IncomeTax - WorkingCapitalChange;
FCF= CFO - CapitalExpenditure;
D_E= TotalDebt/ShareholderEquity;
ROE= NetIncome/ShareholderEquity;
drop TotalAsset CapitalExpenditure Depreciation TotalDebt EBIT GrossProfit NetIncome ShareholderEquity IncomeTax WorkingCapitalChange InterestExpense OperatingExpense CFO;
run;
Proc Export data=Work.Ebsd
Outfile= "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Ready-For-Probit-Logit\Div_2006.csv"
DBMS=csv replace;
Putname=yes;
run;
*Logit;
proc logistic data=work.Ebsd descending;
model Dividend = BookValue FCF D_E ROE/ ctable pprob=0.5;
output out = lout predicted=plogit;
run;
proc qlim data=work.Ebsd;
model Dividend = BookValue FCF D_E ROE/ discrete (dist=logit);
output out=logmargin marginal;
run;
proc means data=logmargin mean std;
var Meff_l_BookValue Meff_l_FCF Meff_l_D_E Meff_l_ROE;
run;
proc means data=lout;
var Dividend plogit;
run;
*Probit;
proc logistic data=work.Ebsd descending;
model Dividend = BookValue FCF D_E ROE/ link=probit ctable pprob=0.5;
output out=Pout predicted=pprobit;
run;
proc qlim data=work.Ebsd;
model Dividend = BookValue FCF D_E ROE/ discrete (dist=normal);
output out=probmargin marginal;
run;
proc means data=probmargin mean std;
var Meff_l_BookValue Meff_l_FCF Meff_l_D_E Meff_l_ROE;
run;
proc means data=Pout;
var Dividend pprobit;
run;
*Create PDF;
filename sas "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Div_Probit_Logit.sas";
ods listing close;
ODS pdf file="C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Div_Probit_Logit.pdf";
data _null_;
infile sas ;
input;
temp=_infile_;
file print ods;
put _ods_;
run;
ODS pdf CLOSE;
ods listing;
One word: macros
I don't know if your pdf code is part of the processing for each of the 11 files, but if so, you would be overwriting it each time. If it needs to run for each of the 11 files, then you can include it at the end of the macro. I've left it out. But for running the rest of code repeatedly for the 11 files, all you need to do is wrap it in a macro, and since your years are sequential, a simple macro %do loop is used to loop through each of the 11 files.
%macro processyr;
%do yr=2006 %to 2016;
PROC IMPORT OUT= work.Ebsd
DATAFILE= "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Original_Div\Div_&yr..csv"
DBMS=CSV REPLACE;*signify currency, year and month. if cite a word as the name, should be end up with '.', check ¤cy.,&year.,&i.;
GETNAMES=YES;
DATAROW=2;
RUN;
data work.EBSd;
set work.Ebsd;
rename at=TotalAsset;
rename bkvlps=BookValue;
rename capx=CapitalExpenditure;
rename dpc=Depreciation;
rename dt=TotalDebt;
rename dv=Dividend;
rename ebit=EBIT;
rename gp=GrossProfit;
rename ni=NetIncome;
rename seq=ShareholderEquity;
rename txt=IncomeTax;
rename wcapch=WorkingCapitalChange;
rename xint=InterestExpense;
rename xopr=OperatingExpense;
drop costat tie dp curcd tic datafmt popsrc consol indfmt fyear datadate gvkey;
run;
data work.EBSd;
set work.Ebsd;
if TotalAsset=. then delete;
if BookValue=. then BookValue=0;
if CapitalExpenditure=. then CapitalExpenditure=0;
if Depreciation=. then Depreciation=0;
if TotalDebt=. then TotalDebt=0;
if EBIT=. then EBIT=0;
if GrossProfit=. then GrossProfit=0;
if NetIncome=. then NetIncome=0;
if OperatingExpense=. then OperatingExpense=0;
if InterestExpense=. then InterestExpense=0;
if WorkingCapitalChange=. then WorkingCapitalChange=0;
if Dividend=0 then Dividend=0;
if Dividend>0 then Dividend=1;
if Dividend=. then Dividend=0;
run;
data work.EBSd;
set work.Ebsd;
CFO= EBIT + Depreciation - IncomeTax - WorkingCapitalChange;
FCF= CFO - CapitalExpenditure;
D_E= TotalDebt/ShareholderEquity;
ROE= NetIncome/ShareholderEquity;
drop TotalAsset CapitalExpenditure Depreciation TotalDebt EBIT GrossProfit NetIncome ShareholderEquity IncomeTax WorkingCapitalChange InterestExpense OperatingExpense CFO;
run;
Proc Export data=Work.Ebsd
Outfile= "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Ready-For-Probit-Logit\Div_&yr..csv"
DBMS=csv replace;
Putname=yes;
run;
*Logit;
proc logistic data=work.Ebsd descending;
model Dividend = BookValue FCF D_E ROE/ ctable pprob=0.5;
output out = lout predicted=plogit;
run;
proc qlim data=work.Ebsd;
model Dividend = BookValue FCF D_E ROE/ discrete (dist=logit);
output out=logmargin marginal;
run;
proc means data=logmargin mean std;
var Meff_l_BookValue Meff_l_FCF Meff_l_D_E Meff_l_ROE;
run;
proc means data=lout;
var Dividend plogit;
run;
*Probit;
proc logistic data=work.Ebsd descending;
model Dividend = BookValue FCF D_E ROE/ link=probit ctable pprob=0.5;
output out=Pout predicted=pprobit;
run;
proc qlim data=work.Ebsd;
model Dividend = BookValue FCF D_E ROE/ discrete (dist=normal);
output out=probmargin marginal;
run;
proc means data=probmargin mean std;
var Meff_l_BookValue Meff_l_FCF Meff_l_D_E Meff_l_ROE;
run;
proc means data=Pout;
var Dividend pprobit;
run;
%end;
%mend;
%processyr;
One word: macros
I don't know if your pdf code is part of the processing for each of the 11 files, but if so, you would be overwriting it each time. If it needs to run for each of the 11 files, then you can include it at the end of the macro. I've left it out. But for running the rest of code repeatedly for the 11 files, all you need to do is wrap it in a macro, and since your years are sequential, a simple macro %do loop is used to loop through each of the 11 files.
%macro processyr;
%do yr=2006 %to 2016;
PROC IMPORT OUT= work.Ebsd
DATAFILE= "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Original_Div\Div_&yr..csv"
DBMS=CSV REPLACE;*signify currency, year and month. if cite a word as the name, should be end up with '.', check ¤cy.,&year.,&i.;
GETNAMES=YES;
DATAROW=2;
RUN;
data work.EBSd;
set work.Ebsd;
rename at=TotalAsset;
rename bkvlps=BookValue;
rename capx=CapitalExpenditure;
rename dpc=Depreciation;
rename dt=TotalDebt;
rename dv=Dividend;
rename ebit=EBIT;
rename gp=GrossProfit;
rename ni=NetIncome;
rename seq=ShareholderEquity;
rename txt=IncomeTax;
rename wcapch=WorkingCapitalChange;
rename xint=InterestExpense;
rename xopr=OperatingExpense;
drop costat tie dp curcd tic datafmt popsrc consol indfmt fyear datadate gvkey;
run;
data work.EBSd;
set work.Ebsd;
if TotalAsset=. then delete;
if BookValue=. then BookValue=0;
if CapitalExpenditure=. then CapitalExpenditure=0;
if Depreciation=. then Depreciation=0;
if TotalDebt=. then TotalDebt=0;
if EBIT=. then EBIT=0;
if GrossProfit=. then GrossProfit=0;
if NetIncome=. then NetIncome=0;
if OperatingExpense=. then OperatingExpense=0;
if InterestExpense=. then InterestExpense=0;
if WorkingCapitalChange=. then WorkingCapitalChange=0;
if Dividend=0 then Dividend=0;
if Dividend>0 then Dividend=1;
if Dividend=. then Dividend=0;
run;
data work.EBSd;
set work.Ebsd;
CFO= EBIT + Depreciation - IncomeTax - WorkingCapitalChange;
FCF= CFO - CapitalExpenditure;
D_E= TotalDebt/ShareholderEquity;
ROE= NetIncome/ShareholderEquity;
drop TotalAsset CapitalExpenditure Depreciation TotalDebt EBIT GrossProfit NetIncome ShareholderEquity IncomeTax WorkingCapitalChange InterestExpense OperatingExpense CFO;
run;
Proc Export data=Work.Ebsd
Outfile= "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Ready-For-Probit-Logit\Div_&yr..csv"
DBMS=csv replace;
Putname=yes;
run;
*Logit;
proc logistic data=work.Ebsd descending;
model Dividend = BookValue FCF D_E ROE/ ctable pprob=0.5;
output out = lout predicted=plogit;
run;
proc qlim data=work.Ebsd;
model Dividend = BookValue FCF D_E ROE/ discrete (dist=logit);
output out=logmargin marginal;
run;
proc means data=logmargin mean std;
var Meff_l_BookValue Meff_l_FCF Meff_l_D_E Meff_l_ROE;
run;
proc means data=lout;
var Dividend plogit;
run;
*Probit;
proc logistic data=work.Ebsd descending;
model Dividend = BookValue FCF D_E ROE/ link=probit ctable pprob=0.5;
output out=Pout predicted=pprobit;
run;
proc qlim data=work.Ebsd;
model Dividend = BookValue FCF D_E ROE/ discrete (dist=normal);
output out=probmargin marginal;
run;
proc means data=probmargin mean std;
var Meff_l_BookValue Meff_l_FCF Meff_l_D_E Meff_l_ROE;
run;
proc means data=Pout;
var Dividend pprobit;
run;
%end;
%mend;
%processyr;
Thank you nehalsanghvi! It works.
Just for curious, if my same data is discrete years say missing 2007 and 2008. How could I create macro based on this if I still want 2006 - 2016? Thank you.
Just add a line right after your do loop:
%if &i. ne 2007 and &i. ne 2008 %then %do;
and, of course, add another %end before the %mend statement.
Art, CEO, AnalystFinder.com
There can be many approaches based on what is known and what is unknown, but if you know for sure your start year is 2006 and end year is 2016, then within the same do loop, you can add a check to see if the file exists before running the rest of the code. If it exists, then proceed with the rest of the code. If it doesn't exist, print a message to the log.
Here's how:
%macro processyr;
%do yr=2006 %to 2016;
%let myfileref = "C:\Users\Xusheng\Desktop\OneDrive\MSCM5P04\A2\Original_Div\Div_&yr..csv";
%if %sysfunc(fileexist(&myfileref.)) %then %do;
PROC IMPORT OUT= work.Ebsd
DATAFILE= &myfileref.
DBMS=CSV REPLACE;*signify currency, year and month. if cite a word as the name, should be end up with '.', check ¤cy.,&year.,&i.;
GETNAMES=YES;
DATAROW=2;
RUN;
/* THE REST OF YOUR CODE GOES HERE */
%end;
%else %put The external file &myfileref does not exist;
%end;
%mend;
%processyr;
This code will skip over any missing files for years between 2006 and 2016.
I don't have time to fully review your code, but did notice the following:
(1) you do multiple datasteps on the same file. Why not combine them into one datasetp.
(2) when you have what are really if then else types, add the else conditions. e.g.:
/*have*/ if Dividend=0 then Dividend=0; if Dividend>0 then Dividend=1; if Dividend=. then Dividend=0; /*should have:*/ if Dividend>0 then Dividend=1; else Dividend=0;
(3) You could concatenate all of the files into one file, using the indsname option to identify the year if it doesn't already exist in the data. Then you could do all/most of your work by including a BY YEAR; statement in your code for each analysis.
HTH,
Art, CEO, AnalystFinder.com
Right after your proc import you have 3 data steps .. all on the same dataset.
SAS has to re-read the data 3 times rather than once. I'd write it like:
data work.EBSd (drop TotalAsset CapitalExpenditure Depreciation TotalDebt EBIT GrossProfit NetIncome ShareholderEquity IncomeTax WorkingCapitalChange InterestExpense OperatingExpense CFO); set work.Ebsd (drop=costat tie dp curcd tic datafmt popsrc consol indfmt fyear datadate gvkey rename=( at=TotalAsset bkvlps=BookValue capx=CapitalExpenditure dpc=Depreciation dt=TotalDebt dv=Dividend ebit=EBIT gp=GrossProfit ni=NetIncome seq=ShareholderEquity txt=IncomeTax wcapch=WorkingCapitalChange xint=InterestExpense xopr=OperatingExpense)); if TotalAsset=. then delete; if BookValue=. then BookValue=0; if CapitalExpenditure=. then CapitalExpenditure=0; if Depreciation=. then Depreciation=0; if TotalDebt=. then TotalDebt=0; if EBIT=. then EBIT=0; if GrossProfit=. then GrossProfit=0; if NetIncome=. then NetIncome=0; if OperatingExpense=. then OperatingExpense=0; if InterestExpense=. then InterestExpense=0; if WorkingCapitalChange=. then WorkingCapitalChange=0; if Dividend>0 then Dividend=1; else Dividend=0; CFO= EBIT + Depreciation - IncomeTax - WorkingCapitalChange; FCF= CFO - CapitalExpenditure; D_E= TotalDebt/ShareholderEquity; ROE= NetIncome/ShareholderEquity; run;
Also, you reset a number of missing values to 0, but don't do that for IncomeTax. As such, if there are any records that have a missing value for IncomeTax, the CFO value will be missing. Unless you want that, either do the same as you did for the other variables or change the statement to:
CFO= sum(EBIT, Depreciation, (- IncomeTax), (- WorkingCapitalChange));
Art, CEO, AnalystFinder.com
Review your log after you run the code. It usually shows you what you have to correct.
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.