I want to do operation year by year on a database. The original database is large and that's why I have to slice it. I know how to loop in macro:
%macro a;
%let i=1;
%do %while(&i<3);
%put How are you &i;
%let i=%eval(&i+1);
%end;
%mend;
%a;
Now I want to insert data extraction in the middle. Something like this but it does not work. Something like this(I deliberately am doing it for one year here):
%macro a;
%let i=1;
%put Hello &i;
%do %while(&i<2);
%put How are you &i;
%let i=%eval(&i+1);
proc sql;
create table crsp_indx as
select vwretd,date,spindx From crsp.dsi
where date>='01DEC%eval(&i+2001)'d and date<'31DEC2002'd ;
*quit;
RUN;
%end;
%mend;
%a;
Thanks.
%macro a;
%let i=1;
%put Hello &i;
%do %while(&i<2);
%put How are you &i;
%let i=%eval(&i+1);
proc sql;
create table crsp_indx_&I as
select vwretd, date, spindx From crsp.dsi
where date >=mdy(1, 12, &i+2001) and date<'31DEC2002'd ;
quit;
%end;
%mend;
%a;
Something like that maybe? The macro appendix (third link below) includes some other examples of looping through dates.
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
%macro a;
%let i=1;
%put Hello &i;
%do %while(&i<2);
%put How are you &i;
%let i=%eval(&i+1);
proc sql;
create table crsp_indx_&I as
select vwretd, date, spindx From crsp.dsi
where date >=mdy(1, 12, &i+2001) and date<'31DEC2002'd ;
quit;
%end;
%mend;
%a;
Something like that maybe? The macro appendix (third link below) includes some other examples of looping through dates.
UCLA introductory tutorial on macro variables and macros
https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/
Tutorial on converting a working program to a macro
This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md
Examples of common macro usage
https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...
Thanks. This works very well. I will read the tutorial by today or tomorrow as well!
I want to do operation year by year on a database. The original database is large and that's why I have to slice it.
Unless this is so large that your computer can't handle it all in one operation, then you want to use a BY statement and not split the data set — this is MUCH MUCH MUCH easier to program and will also execute faster.
I should learn by statement more. Obviously I've used it to generate columns by group. Just for clarification the code I sent is a sample code. There will be many lines in between the while statement. For example this is some of the other parts of the code all of them should be put inside the do while (and years be changed to the loop variable):
proc sql;
create table hld as select eff_dt,coupon,ticker,nbr_shares,report_dt,
percent_tna,market_val,cusip,crsp_portno,crsp_company_key,permno,permco from crsp.holdings
where eff_dt>='01DEC2002'd and eff_dt<'31DEC2020'd ;
*quit;
RUN;
data hld ;
set hld;
if missing(cusip) then delete;
y=year(eff_dt);
q=qtr(eff_dt);
m=month(eff_dt);
run;
*crspd is the daily prices CRSP dataset;
proc sql;
create table crspd2 as
select cusip,shrout,ret,date,permno,permco,prc From crsp.dsf
where date>='01DEC2002'd and date<'31DEC2020'd ;
*quit;
RUN;
1) You are right. The reason is that currently the code does not have any loops and the date is random date just to do testing. I will put all of them inside loop (or just do by as others suggested but should think about it more). Because I'm doing it cloud (SAS studio) the organization does not give me much space (overall something like 5GB+scratch shared folder).
2)I agree. I should combine them. Part of that is lack of knowledge.
3)correct. All of those will be linked to loop's changing variable
4) By date conversions you mean generating year variables? That's for merging purpose with other datasets and aggregating later
It may help to learn some date functions.
If you have a SAS date value and want values from a specific calendar year use: Where year(date) = 2001 instead of trying to sandwich between specific dates.
Macro variables do not evaluate when placed inside single quotes.
If you are going to do the same thing to each "year" then likely either by group processing would be better. So instead of creating multiple data sets add a Year variable and make sure the data is ordered by year (or other values plus year) .
If you provide examples of what you are actually going to do with those "year" data sets there may be other ways to avoid creating data sets. Some of them are as simple:
Proc whatever data=thefulldataset;
where year(date)=2002;
run;
No need to create a different data set, just filter at use.
Thanks. This is very insightful. This is part of my code (the dates are there just for test. The actual dates will be year by year for the reason I mentioned). Basically each year I will get data from different tables (the tables are in the cloud of data provider) some of them are big (like 10 GB each year), then clean,merge,sort, aggregate them. At the end for each year one table will be the final output. If I do it year by year I should kind of put the final output tables on top of each other. Each final output table has small size (~50MB).
This is a sample of first lines of the code:
proc sql;
create table hld as select eff_dt,coupon,ticker,nbr_shares,report_dt,
percent_tna,market_val,cusip,crsp_portno,crsp_company_key,permno,permco from crsp.holdings
where eff_dt>='01DEC2002'd and eff_dt<'31DEC2020'd ;
*quit;
RUN;
data hld ;
set hld;
if missing(cusip) then delete;
y=year(eff_dt);
q=qtr(eff_dt);
m=month(eff_dt);
run;
*crspd is the daily prices CRSP dataset;
proc sql;
create table crspd2 as
select cusip,shrout,ret,date,permno,permco,prc From crsp.dsf
where date>='01DEC2002'd and date<'31DEC2020'd ;
*quit;
RUN;
proc sql;
create table crsp_indx as
select vwretd,date,spindx From crsp.dsi
where date>='01DEC2002'd and date<'31DEC2020'd ;
*quit;
RUN;
proc sql;
create table crspd as select *
From crspd2 inner join crsp_indx on
(crspd2.date = crsp_indx.date) ;
*quit;
RUN;
proc datasets ;
delete crspd2 crsp_indx;
run;
data crspd ;
set crspd;
if missing(cusip) then delete;
y=year(date);
q=qtr(date);
m=month(date);
relP=PRC/spindx;
run;
proc sort data= crspd out= crspd ;
by cusip date ;
run;
data crspd;
set crspd;
by cusip;
lagW_relP=lag7(relP);
RETw=(relP-lagW_relP)/lagW_relP;
run;
Proc sql;
create table crspd as
Select *, std(RETw) as std1, mean(RETw) as mean1
From crspd
GROUP BY cusip,y
ORDER BY cusip,y;
*quit;
RUN;
and
proc sql;
create table cmpstat as
select datadate,fyearq,fqtr,fyr,cusip,gvkey From compd.fundq
where datadate>='01DEC2010'd and datadate<'31DEC2020'd ;
*quit;
RUN;
data cmpstat;
set cmpstat;
by gvkey ;
if first.gvkey then output;
run;
proc sort data= cmpstat ;
by cusip fyearq fqtr;
run;
data cmpstat_q;
set cmpstat;
by cusip fyearq fqtr;
if first.fqtr then output;
run;
proc sql;
create table sicx as
select sic,gvkey From compa.company;
*quit;
RUN;
proc sql;
create table cmp_sic_q as select *
From cmpstat_q inner join sicx on
( cmpstat_q.gvkey = sicx.gvkey) ;
*quit;
RUN;
proc sql;
create table cmp_sic as select *
From cmpstat inner join sicx on
( cmpstat.gvkey = sicx.gvkey) ;
*quit;
RUN;
*map contains fund names and so on.The holding databse contains
only portfo numbers and not names. Most importantly the map
let's us aggregate everything on fund level;
proc sql;
create table map as
select crsp_fundno,crsp_portno,fund_name,mgmt_name,mgmt_cd,ncusip From crsp.portnomap;
*quit;
RUN;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.