BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
austin_sas
Fluorite | Level 6

I want to do operation year by year on a database. The original database is large and that's why I have to slice it. I know how to loop in macro:

%macro a;
  %let i=1;
  %do %while(&i<3);
    %put How are you &i;
    %let i=%eval(&i+1);
  %end;
%mend;
%a;

Now I want to insert data extraction in the middle. Something like this but it does not work. Something like this(I deliberately am doing it for one year here):

%macro a;
  %let i=1;
      %put Hello &i;
  %do %while(&i<2);
    %put How are you &i;
    %let i=%eval(&i+1);
        proc sql;
	create table crsp_indx as
	select vwretd,date,spindx From crsp.dsi
	where date>='01DEC%eval(&i+2001)'d and date<'31DEC2002'd ;
	*quit;
	RUN;
  %end;
%mend;
%a;

Thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
%macro a;
  %let i=1;
      %put Hello &i;
  %do %while(&i<2);
    %put How are you &i;
    %let i=%eval(&i+1);
        proc sql;
	create table crsp_indx_&I as
	select vwretd, date, spindx From crsp.dsi
	where date >=mdy(1, 12, &i+2001) and date<'31DEC2002'd ;
	quit;
  %end;
%mend;
%a;

Something like that maybe? The macro appendix (third link below) includes some other examples of looping through dates.

 

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

View solution in original post

8 REPLIES 8
Reeza
Super User
%macro a;
  %let i=1;
      %put Hello &i;
  %do %while(&i<2);
    %put How are you &i;
    %let i=%eval(&i+1);
        proc sql;
	create table crsp_indx_&I as
	select vwretd, date, spindx From crsp.dsi
	where date >=mdy(1, 12, &i+2001) and date<'31DEC2002'd ;
	quit;
  %end;
%mend;
%a;

Something like that maybe? The macro appendix (third link below) includes some other examples of looping through dates.

 

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

austin_sas
Fluorite | Level 6

Thanks. This works very well. I will read the tutorial by today or tomorrow as well!

PaigeMiller
Diamond | Level 26

I want to do operation year by year on a database. The original database is large and that's why I have to slice it.

Unless this is so large that your computer can't handle it all in one operation, then you want to use a BY statement and not split the data set — this is MUCH MUCH MUCH easier to program and will also execute faster.

--
Paige Miller
austin_sas
Fluorite | Level 6

I should learn by statement more. Obviously I've used it to generate columns by group. Just for clarification the code I sent is a sample code. There will be many lines in between the while statement. For example this is some of the other parts of the code all of them should be put inside the do while (and years be changed to the loop variable):

proc sql;
create table hld as select eff_dt,coupon,ticker,nbr_shares,report_dt,
percent_tna,market_val,cusip,crsp_portno,crsp_company_key,permno,permco from crsp.holdings
where eff_dt>='01DEC2002'd and eff_dt<'31DEC2020'd ;
*quit;
RUN;


data hld ;
	set hld;
	if missing(cusip) then delete;
	y=year(eff_dt);
	q=qtr(eff_dt);
	m=month(eff_dt);
	run;

*crspd is the daily prices CRSP dataset;
proc sql;
create table crspd2 as
select cusip,shrout,ret,date,permno,permco,prc From crsp.dsf
where date>='01DEC2002'd and date<'31DEC2020'd ;
*quit;
RUN;

 

Reeza
Super User
1. You're not currently factoring into your code how to unique identify input/output data sets for each loop.
2. You have multiple steps that are all wrangling - combine them into a single steps wherever possible
3. If you're using the same hardcoded values in multiple places those are the values you should be changing somewhere.
4. Not sure why you're using date conversions but know that you can use formats to avoid the need to add these extra variables.
austin_sas
Fluorite | Level 6

1) You are right. The reason is that currently the code does not have any loops and the date is random date just to do testing. I will put all of them inside loop (or just do by as others suggested but should think about it more). Because I'm doing it cloud (SAS studio) the organization does not give me much space (overall something like 5GB+scratch shared folder).

2)I agree. I should combine them. Part of that is lack of knowledge.

3)correct. All of those will be linked to loop's changing variable

4) By date conversions you mean generating year variables? That's for merging purpose with other datasets and  aggregating later

ballardw
Super User

It may help to learn some date functions.

If you have a SAS date value and want values from a specific calendar year use: Where year(date) = 2001 instead of trying to sandwich between specific dates.

 

Macro variables do not evaluate when placed inside single quotes.

 

If you are going to do the same thing to each "year" then likely either by group processing would be better. So instead of creating multiple data sets add a Year variable and make sure the data is ordered by year (or other values plus year) .

 

If you provide examples of what you are actually going to do with those "year" data sets there may be other ways to avoid creating data sets. Some of them are as simple:

 

Proc whatever data=thefulldataset;

   where year(date)=2002;

run;

 

No need to create a different data set, just filter at use.

austin_sas
Fluorite | Level 6

Thanks. This is very insightful. This is part of my code (the dates are there just for test. The actual dates will be year by year for the reason I mentioned). Basically each year I will get data from different tables (the tables are in the cloud of data provider) some of them are big (like 10 GB each year), then clean,merge,sort, aggregate them. At the end for each year one table will be the final output. If I do it year by year I should kind of put the final output tables on top of each other. Each final output table has small size (~50MB).

This is a sample of first lines of the code:

proc sql;
create table hld as select eff_dt,coupon,ticker,nbr_shares,report_dt,
percent_tna,market_val,cusip,crsp_portno,crsp_company_key,permno,permco from crsp.holdings
where eff_dt>='01DEC2002'd and eff_dt<'31DEC2020'd ;
*quit;
RUN;


data hld ;
	set hld;
	if missing(cusip) then delete;
	y=year(eff_dt);
	q=qtr(eff_dt);
	m=month(eff_dt);
	run;

*crspd is the daily prices CRSP dataset;
proc sql;
create table crspd2 as
select cusip,shrout,ret,date,permno,permco,prc From crsp.dsf
where date>='01DEC2002'd and date<'31DEC2020'd ;
*quit;
RUN;


proc sql;
create table crsp_indx as
select vwretd,date,spindx From crsp.dsi
where date>='01DEC2002'd and date<'31DEC2020'd ;
*quit;
RUN;

proc sql;
create table crspd as select *
From crspd2 inner join crsp_indx  on
 (crspd2.date = crsp_indx.date) ;
*quit;
RUN;

proc datasets ;
   delete crspd2 crsp_indx;
run;


data crspd ;
	set crspd;
	if missing(cusip) then delete;
	y=year(date);
	q=qtr(date);
	m=month(date);
	relP=PRC/spindx;
	run;

proc sort data= crspd out= crspd ;
	by cusip date ;
run;

data crspd;
	set crspd;
	by cusip;
	   lagW_relP=lag7(relP); 
	   RETw=(relP-lagW_relP)/lagW_relP;
run;


Proc sql;
create table crspd as
Select *, std(RETw) as std1, mean(RETw) as mean1
From crspd
GROUP BY cusip,y
ORDER BY cusip,y;
*quit; 
RUN;

and

proc sql;
create table cmpstat as
select datadate,fyearq,fqtr,fyr,cusip,gvkey From compd.fundq
where datadate>='01DEC2010'd and datadate<'31DEC2020'd ;
*quit;
RUN;


data cmpstat;
	set cmpstat;
	by gvkey ;
	if first.gvkey  then output;
run;


proc sort data= cmpstat  ;
	by cusip fyearq fqtr;
run;

data cmpstat_q;
	set cmpstat;
	by cusip fyearq fqtr;
	if first.fqtr  then output;
run;

proc sql;
create table sicx as
select sic,gvkey From compa.company;
*quit;
RUN;

proc sql;
create table cmp_sic_q as select *
From cmpstat_q inner join sicx  on
( cmpstat_q.gvkey = sicx.gvkey) ;
*quit;
RUN;


proc sql;
create table cmp_sic as select *
From cmpstat inner join sicx  on
( cmpstat.gvkey = sicx.gvkey) ;
*quit;
RUN;

*map contains fund names and so on.The holding databse contains
only portfo numbers and not names. Most importantly the map
let's us aggregate everything on fund level;
proc sql;
create table map as
select crsp_fundno,crsp_portno,fund_name,mgmt_name,mgmt_cd,ncusip From crsp.portnomap;
*quit;
RUN;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 893 views
  • 4 likes
  • 4 in conversation