BookmarkSubscribeRSS Feed
Time_Looper47
Obsidian | Level 7

Greetings,

 

I've developed a macro to help produce data sets pulling from different tables and date ranges, however I believe I'm not doing this in the most efficient way. Btw I'm working with Teradata so it is passing through Teradata. Here is my code:

 

data test_data;
infile datalines;
format cust_id 3. depo_dt yymmddd10. deposit_amt dollar6.2 year_month source;
input cust_id 3. depo_dt :yymmdd10. deposit_amt dollar6.2 year_month source $16.;
datalines;
871 2019-02-21 $20.00 201902 USBANK
888 2019-05-13 $90.00 201905 WELLSFARGO
847 2019-03-25 $60.00 201903 CHASE
821 2019-09-18 $30.00 201909 CITIGROUP
;;
run;


%macro td_sql_macro (tbl_nme,yyyymm,mnth_1,mnth2);
proc sql noerrorstop;
connect to teradata (database=BANK_DATA user="user" password="pw" tdpid="tdpid" mode=teradata);
create table work.&tbl_nme as select * from connection to teradata (
  select cust_id,
         depo_dt,
	     deposit_amt,
	     year_month,
	     source
  from BANK_DATA.source_data_&yyyymm._hist
  where depo_dt >= date &mnth_1
  and depo_dt < date &mnth2
  and prc_dt = &yyyymm
  and cust_id <> 0);
disconnect from teradata;
quit;
%mend;

%td_sql_macro(tbl_nme=bank_data_201809,yyyymm=201809,mnth_1='2018-09-01',mnth2='2018-10-01')
%td_sql_macro(tbl_nme=bank_data_201810,yyyymm=201810,mnth_1='2018-10-01',mnth2='2018-11-01');
%td_sql_macro(tbl_nme=bank_data_201811,yyyymm=201811,mnth_1='2018-11-01',mnth2='2018-12-01');
%td_sql_macro(tbl_nme=bank_data_201812,yyyymm=201812,mnth_1='2018-12-01',mnth2='2019-01-01');
%td_sql_macro(tbl_nme=bank_data_201901,yyyymm=201901,mnth_1='2019-01-01',mnth2='2019-02-01');
%td_sql_macro(tbl_nme=bank_data_201902,yyyymm=201902,mnth_1='2019-02-01',mnth2='2019-03-01');
%td_sql_macro(tbl_nme=bank_data_201903,yyyymm=201903,mnth_1='2019-03-01',mnth2='2019-04-01');
%td_sql_macro(tbl_nme=bank_data_201904,yyyymm=201904,mnth_1='2019-04-01',mnth2='2019-05-01');
%td_sql_macro(tbl_nme=bank_data_201905,yyyymm=201905,mnth_1='2019-05-01',mnth2='2019-06-01');
%td_sql_macro(tbl_nme=bank_data_201906,yyyymm=201906,mnth_1='2019-06-01',mnth2='2019-07-01');
%td_sql_macro(tbl_nme=bank_data_201907,yyyymm=201907,mnth_1='2019-07-01',mnth2='2019-08-01');
%td_sql_macro(tbl_nme=bank_data_201908,yyyymm=201908,mnth_1='2019-08-01',mnth2='2019-09-01');

libname BNK_DPST teradata tdpid="tdpid" database=bnk user="user" password="pw" bulkload=yes;

data BNK_DPST.deposits_all;
  set bank_data_201908
      bank_data_201907
	  bank_data_201906
	  bank_data_201905
	  bank_data_201904
	  bank_data_201903
	  bank_data_201902
	  bank_data_201901
      bank_data_201812
      bank_data_201811
	  bank_data_201810
	  bank_data_201809;
run;

The beginning data step is just to see what the data is like, just pretend it is from one of the BANK_DATA.source_data_ tables. So this code works, however I want to make it more efficient/automated in the sense that instead of creating lines calling the macro with different values, I want to automate the macro to just loop through the different values, rather than hard-coding them. Looping through the different tables as well as date filters. The ultimate goal as you can see on the bottom is to combine all the created tables and then create the table in Teradata. I want the data to be a 12-month lookback period. Starting from todays month - 1. So if I was running it for September, I would want the date loops to start at August, going back 12 months.

 

Can anyone help? See opportunities for improvement on how I can automatically loop through the different values without hard-coding table names and dates.

 

Thanks

11 REPLIES 11
tomrvincent
Rhodochrosite | Level 12
Any indeces in BANK_DATA.source_data_&yyyymm._hist?
Reeza
Super User

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

 

The macro appendix has an example of looping through dates. My question would be do you actually need to loop? Can you not just do one pass to filter the entire period at once?

 

I would add a PROC APPEND at the bottom of the macro and remove intermediary data sets as well, this way your data is stacked automatically rather than hard coding any dates. 

 

Use PROC DATASETS to drop tables. 

 


@Time_Looper47 wrote:

Greetings,

 

I've developed a macro to help produce data sets pulling from different tables and date ranges, however I believe I'm not doing this in the most efficient way. Btw I'm working with Teradata so it is passing through Teradata. Here is my code:

 

data test_data;
infile datalines;
format cust_id 3. depo_dt yymmddd10. deposit_amt dollar6.2 year_month source;
input cust_id 3. depo_dt :yymmdd10. deposit_amt dollar6.2 year_month source $16.;
datalines;
871 2019-02-21 $20.00 201902 USBANK
888 2019-05-13 $90.00 201905 WELLSFARGO
847 2019-03-25 $60.00 201903 CHASE
821 2019-09-18 $30.00 201909 CITIGROUP
;;
run;


%macro td_sql_macro (tbl_nme,yyyymm,mnth_1,mnth2);
proc sql noerrorstop;
connect to teradata (database=BANK_DATA user="user" password="pw" tdpid="tdpid" mode=teradata);
create table work.&tbl_nme as select * from connection to teradata (
  select cust_id,
         depo_dt,
	     deposit_amt,
	     year_month,
	     source
  from BANK_DATA.source_data_&yyyymm._hist
  where depo_dt >= date &mnth_1
  and depo_dt < date &mnth2
  and prc_dt = &yyyymm
  and cust_id <> 0);
disconnect from teradata;
quit;
%mend;

%td_sql_macro(tbl_nme=bank_data_201809,yyyymm=201809,mnth_1='2018-09-01',mnth2='2018-10-01')
%td_sql_macro(tbl_nme=bank_data_201810,yyyymm=201810,mnth_1='2018-10-01',mnth2='2018-11-01');
%td_sql_macro(tbl_nme=bank_data_201811,yyyymm=201811,mnth_1='2018-11-01',mnth2='2018-12-01');
%td_sql_macro(tbl_nme=bank_data_201812,yyyymm=201812,mnth_1='2018-12-01',mnth2='2019-01-01');
%td_sql_macro(tbl_nme=bank_data_201901,yyyymm=201901,mnth_1='2019-01-01',mnth2='2019-02-01');
%td_sql_macro(tbl_nme=bank_data_201902,yyyymm=201902,mnth_1='2019-02-01',mnth2='2019-03-01');
%td_sql_macro(tbl_nme=bank_data_201903,yyyymm=201903,mnth_1='2019-03-01',mnth2='2019-04-01');
%td_sql_macro(tbl_nme=bank_data_201904,yyyymm=201904,mnth_1='2019-04-01',mnth2='2019-05-01');
%td_sql_macro(tbl_nme=bank_data_201905,yyyymm=201905,mnth_1='2019-05-01',mnth2='2019-06-01');
%td_sql_macro(tbl_nme=bank_data_201906,yyyymm=201906,mnth_1='2019-06-01',mnth2='2019-07-01');
%td_sql_macro(tbl_nme=bank_data_201907,yyyymm=201907,mnth_1='2019-07-01',mnth2='2019-08-01');
%td_sql_macro(tbl_nme=bank_data_201908,yyyymm=201908,mnth_1='2019-08-01',mnth2='2019-09-01');

libname BNK_DPST teradata tdpid="tdpid" database=bnk user="user" password="pw" bulkload=yes;

data BNK_DPST.deposits_all;
  set bank_data_201908
      bank_data_201907
	  bank_data_201906
	  bank_data_201905
	  bank_data_201904
	  bank_data_201903
	  bank_data_201902
	  bank_data_201901
      bank_data_201812
      bank_data_201811
	  bank_data_201810
	  bank_data_201809;
run;

The beginning data step is just to see what the data is like, just pretend it is from one of the BANK_DATA.source_data_ tables. So this code works, however I want to make it more efficient/automated in the sense that instead of creating lines calling the macro with different values, I want to automate the macro to just loop through the different values, rather than hard-coding them. Looping through the different tables as well as date filters. The ultimate goal as you can see on the bottom is to combine all the created tables and then create the table in Teradata. I want the data to be a 12-month lookback period. Starting from todays month - 1. So if I was running it for September, I would want the date loops to start at August, going back 12 months.

 

Can anyone help? See opportunities for improvement on how I can automatically loop through the different values without hard-coding table names and dates.

 

Thanks



 

 

 

Time_Looper47
Obsidian | Level 7

Thank you for the resource, I think this will be very helpful. The reason I can't do it with one date range is because the source tables are broken out by month - look at the FROM. The table changes.

 

So the query will run for BANK_DATA.source_data_201908_hist with August in the WHERE filter.

Then BANK_DATA.source_data_201907_hist with the July filter in the WHERE.

Then BANK_DATA.source_data_201906_hist with the June filter in the WHERE. And so on.

Reeza
Super User
I can't speak for everywhere, but in most places there is a table with a year or two of history in addition to the monthly tables. It may be worth asking your DB. If you're certain that's the limit then using a data step to loop is an option.
Reeza
Super User
%macro td_sql_macro (tbl_nme,yyyymm,mnth_1,mnth2);
 *Change to have your date start and end or date start and number of intervals;

[insert do loop for dates here according to macro link]

proc sql noerrorstop;
connect to teradata (database=BANK_DATA user="user" password="pw" tdpid="tdpid" mode=teradata);
create table work.&tbl_nme as select * from connection to teradata (
  select cust_id,
         depo_dt,
	     deposit_amt,
	     year_month,
	     source
  from BANK_DATA.source_data_&yyyymm._hist
  where depo_dt >= date &mnth_1
  and depo_dt < date &mnth2
  and prc_dt = &yyyymm
  and cust_id <> 0);
disconnect from teradata;
quit;

*Append to main data table;
proc append base = final_data data=work.&tbl_nme force;
run;

*delete temporary data set;
proc datasets lib=work nodetails nolist;
delete work.&tbl_nme;
run;quit;

[END DO LOOP HERE]


%mend;

My thoughts on how I would do this. There are other ways of course. 

One would be to have it driven by a data null step + CALL EXECUTE instead. 

 

Here's my full references for Macros that may be helpful:

 

UCLA introductory tutorial on macro variables and macros

https://stats.idre.ucla.edu/sas/seminars/sas-macros-introduction/

Tutorial on converting a working program to a macro

This method is pretty robust and helps prevent errors and makes it much easier to debug your code. Obviously biased, because I wrote it 🙂 https://github.com/statgeek/SAS-Tutorials/blob/master/Turning%20a%20program%20into%20a%20macro.md

Examples of common macro usage

https://communities.sas.com/t5/SAS-Communities-Library/SAS-9-4-Macro-Language-Reference-Has-a-New-Ap...

Time_Looper47
Obsidian | Level 7

@Reeza 

 

I've made things a bit simpler for the purpose of experimentation and so you can see better what I'm trying to do, however I am stuck as to how to dynamically move through macro variable values - I need to loop through the dates as well as table name and year/month. See the following simplified code:

 

data source_data_201907_hist;
infile datalines;
format cust_id 3. depo_dt yymmddd10. deposit_amt dollar6.2 year_month source;
input cust_id 3. depo_dt :yymmdd10. deposit_amt dollar6.2 year_month source $16.;
datalines;
822 2019-07-27 $25.00 201907 PNCFS
268 2019-07-07 $68.00 201907 JPMORGAN
878 2019-07-18 $50.00 201907 STATESTREET
790 2019-07-26 $26.00 201907 USBANKCORP
;;
run;

data source_data_201908_hist;
infile datalines;
format cust_id 3. depo_dt yymmddd10. deposit_amt dollar6.2 year_month source;
input cust_id 3. depo_dt :yymmdd10. deposit_amt dollar6.2 year_month source $16.;
datalines;
822 2019-08-23 $30.00 201908 CHASE
268 2019-08-09 $70.00 201908 PNCFS
878 2019-08-22 $80.00 201908 USBANK
790 2019-08-19 $20.00 201908 BOA
;;
run;

data source_data_201909_hist;
infile datalines;
format cust_id 3. depo_dt yymmddd10. deposit_amt dollar6.2 year_month source;
input cust_id 3. depo_dt :yymmdd10. deposit_amt dollar6.2 year_month source $16.;
datalines;
871 2019-09-21 $20.00 201909 USBANK
888 2019-09-13 $90.00 201909 WELLSFARGO
847 2019-09-25 $60.00 201909 CHASE
821 2019-09-18 $30.00 201909 CITIGROUP
;;
run;

/* Hard-coded versions */ proc sql; create table work.test1 as select * from work.source_data_201907_hist where depo_dt >= '01jul2019'd and depo_dt < '01aug2019'd and year_month = 201907; quit; proc sql; create table work.test2 as select * from work.source_data_201908_hist where depo_dt >= '01aug2019'd and depo_dt < '01sep2019'd and year_month = 201908; quit; proc sql; create table work.test3 as select * from work.source_data_201909_hist where depo_dt >= '01sep2019'd and depo_dt < '01oct2019'd and year_month = 201909; quit;
/* Macro version */ %macro sql_loop(tbl,start,end,ym); proc sql; create table work.&tbl as select * from work.source_data_&ym._hist where depo_dt >= &start and depo_dt < &end and year_month = &ym; quit; proc append base=final_data data=work.&tbl force; run; proc datasets lib=work nodetails nolist; delete work.&tbl; run; quit; %mend;

%sql_loop;

 

The idea for me is I don't want to make any explicit calls at the end, just call the macro at the end like I have above. I found the code below that loops through dates but am confused about how to work the do loop into my query and define the macro variable within the macro. And again I need to loop through not just dates but the table name and yearmonth.

 

%macro date_loop(start,end);
   %let start=%sysfunc(inputn(&start,anydtdte9.));
   %let end=%sysfunc(inputn(&end,anydtdte9.));
   %let dif=%sysfunc(intck(month,&start,&end));
     %do i=0 %to &dif;
      %let date=%sysfunc(intnx(month,&start,&i,b),date9.);
      %put &date;
     %end;
   %mend date_loop;

   %date_loop(01jul2015,01feb2016)

 

 

The documentation you provided has helped me think about it and understand the examples a bit but I'm just stuck on implemented it within mine. Any help would be greatly appreciated.

Tom
Super User Tom
Super User

Nothing is stopping you from making as many macro variables as you need to help make your logic clear and your code generation easy.

%let n_months=%sysfunc(intck(month,&start,&end));
%do index=0 to &n_months;
   %let start_of_month = %sysfunc(intnx(month,&start,&index,s));
   %let part_of_table_name = %sysfunc(putn(&start_of_month,yymmn6.));
   %let td_start=DATE(%str(%')%sysfunc(putn(&start_of_month,yymmddd10.))%str('));
   %let td_end=DATE(%str(%')%sysfunc(intnx(month(&start_of_month,0,e),yymmddd10.)%str('));
  ...
  from xxxx&part_of_table_name.
  where datevar between &td_start and &td_end 
  ...
%end;
SASKiwi
PROC Star

Why are you running 12 separate queries to get a year's-worth of data? It would be likely more efficient to do this in one query.

Time_Looper47
Obsidian | Level 7

Because - look at the from:

 

from BANK_DATA.source_data_&yyyymm._hist

it changes so I can't just do it all in one. I pull from source_data_201908_hist with the August date range in the WHERE. Then pull from souce_data_201907_hist with the July date range in the WHERE. An so on and so forth.

SASKiwi
PROC Star

In that case you could do a UNION of the 12 tables in Teradata and probably still be more efficient than doing one at a time.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1506 views
  • 5 likes
  • 5 in conversation