Hi everyone - am trying to design a loop to append a variable number of data tables together - there's a financial metric in monthly tables that I need to add up on, say a YTD basis.. so in March would need to add the value of money_value for any given account in tables TABLE_2018_01, TABLE_2018_02 and TABLE_2018_03... but in September would need to also add TABLE_2018_04, 2018_05, etc through to TABLE_2018_09. I can manually append these tables together, total the value by account of money_value, but want to automate this for flexible time periods.
I've set up prompts to allow a user to select their end month, and have set up a macro variable to calculate the number of months between the start of year and whatever the end up is of the user - this all works for the purposes of various other data elements that I don't have to pull from individual tables,
However, having trouble creating a loop that could allow appending a non-static number of tables together based on the interval between the user defined end date and the start date.
<prompts runs, get user to select a month saved to %extract_month_end>
/* generate macro variables */
%let interval = %sysfunc(intck(month,"31DEC2017"d,"&extract_month_end"d));
run;
%macro generate_data;
data first_table;
set table_2018_01;
run;
%let i=0;
%DO %UNTIL (&i=&interval)
%let i=i&+1
%let ym_seq = %sysfunc(intnx(month,"&extract_month_end"d,+i,e),yymmn6.);
%let year_month_seq = %sysfunc(substr(&ym_seq,1,4))_%sysfunc(substr(ym_seq,5));
proc append base=first_table
data=TABLE_year_month_seq;
%end;
<numerous other things which are working fine>
%mend;
Any advice here would be appreciated!
Instead of building N proc appends, make a space-separated list of N dataset names to put in a SET statement, producing something like. If N=9 it would be:
data want;
set
table_2018_01
table_2018_02
...
table_2018_09
;
run;
  %let interval = %sysfunc(intck(month,"31DEC2017"d,"&extract_month_end"d));
%macro generate_data;
  %let dslist=;
  %do i=1 %to &interval;
    %let yyyy_mm=%sysfunc(intnx(month,"31dec2017"d,&i,e),yymm7.);
    %let yyyy_mm=%sysfunc(tranwrd(&yyyy_mm,M,_)); /*Change 2018M02 to 2018_02*/
    %let dslist=&dslist TABLE_&yyyy_mm;
  %end;
 
  data want;
    set &dslist ;
  run;
     <numerous other things which are working fine>
%mend;
options mprint;
%generate_data;
  
Instead of building N proc appends, make a space-separated list of N dataset names to put in a SET statement, producing something like. If N=9 it would be:
data want;
set
table_2018_01
table_2018_02
...
table_2018_09
;
run;
  %let interval = %sysfunc(intck(month,"31DEC2017"d,"&extract_month_end"d));
%macro generate_data;
  %let dslist=;
  %do i=1 %to &interval;
    %let yyyy_mm=%sysfunc(intnx(month,"31dec2017"d,&i,e),yymm7.);
    %let yyyy_mm=%sysfunc(tranwrd(&yyyy_mm,M,_)); /*Change 2018M02 to 2018_02*/
    %let dslist=&dslist TABLE_&yyyy_mm;
  %end;
 
  data want;
    set &dslist ;
  run;
     <numerous other things which are working fine>
%mend;
options mprint;
%generate_data;
  
Hi you could use the SAS metadata to Loop over the requested datasets.
%macro append(dsn);
proc append base=alldata data=&dsn;
run;
%mend append;
proc format; 
  picture x_yyyy_mm (default=7)
    low - high = '%Y_%0m' (datatype=date)
  ;
run; 
%let start=30NOV2017;
%let end=28FEB2018;
data _NULL_;
   set sashelp.vtable;
   where libname eq 'WORK' 
   and index(memname,'TABLE_') 
   and substr(memname,7) gt put("&start."d,x_yyyy_mm.) 
   and substr(memname,7) le put("&end."d,x_yyyy_mm.);
   call execute('%nrstr(%append('||strip(libname)||'.'||strip(memname)||'))');
run;- Cheers -
Thanks - worked like a charm.
The problem I see with your 'interval' method is that you append as many tables as you need without checking how many tables you actually have in your 'interval'.
If you ever miss a table in your interval, you will erroneously append the table of the following month(!)
One other thing is that it crashes if you ever try to calculate within years. (ie. from Aug2016 to Feb2017)
And one last thing is that proc append was more efficient and warns in case one of your table do not match the database structure of the first table.
I would recommend a defensive approach especially in the financial sector.
- Cheers -
@Oligolas wrote:
The problem I see with your 'interval' method is that you append as many tables as you need without checking how many tables you actually have in your 'interval'.
If you ever miss a table in your interval, you will erroneously append the table of the following month(!)
One other thing is that it crashes if you ever try to calculate within years. (ie. from Aug2016 to Feb2017)
And one last thing is that proc append was more efficient and warns in case one of your table do not match the database structure of the first table.
I would recommend a defensive approach especially in the financial sector.
Good point on defending against missing tables, but that can be easily addressed within the interval approach, by testing for existence while building the DSLIST macrovar. This code skips absent months:
%macro generate_data;
  %let dslist=;
  %do i=1 %to &interval;
    %let yyyy_mm=%sysfunc(intnx(month,"31dec2017"d,&i,e),yymm7.);
    %let yyyy_mm=%sysfunc(tranwrd(&yyyy_mm,M,_)); /*Change 2018M02 to 2018_02*/
    %if %sysfunc(exist(work.TABLE_&yyyy_mm)) %then
    %let dslist=&dslist TABLE_&yyyy_mm;
  %end;
 
  data want;
    set &dslist ;
  run;
     <numerous other things which are working fine>
%mend;
options mprint;
%let interval=5;
%generate_data;
And if the users want to terminate the macro whenever a month is missing, then following the "%if sysfunc..." statement with
%else %return;
I don't understand the crash prediction "if you every try to calculate within years" (actually I think you mean across years, but I still don't understand the crash prediction).
In my experience I haven't encounter a situation in which multiple proc appends is a more efficient approach than multiple arguments to a single SET statement.
As to the protection append offers against mixed database structures, if the OP's production environment is like those I have seen, a collection of dataset names like TABLE_2018_01 TABLE_2018_02, etc. means identical structures.
However, the SET statement will stop the data step when a variable is numeric in some datasets and character in others. And in addition you can add the "open=defer" option to the SET statement, as in:
set &dslist open=defer;
which will (1) force the output dataset to only get variables in the first name in &dslist, and (2) issue warnings on the log when any dataset has a differing structure. It also saves memory, since it tells SAS to re-user the same input buffer for each dataset rather than set up a buffer for each dataset.
Yes, I agree with the idea of a defensive approach, but I'm not sure the 'interval' method is intrinsically less defensive.
Hi,
maybe the approaches and experiences between the finance and pharmaceutical sector are different.
With 'calculation within years' I meant starting from a month other than december 2017 let's say November 2017 (it raises an intnx error) but maybe it is not relevant because calculations are only done within a year. nvm.
- Cheers -
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
