BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nvd
Calcite | Level 5 nvd
Calcite | Level 5

Hi everyone - am trying to design a loop to append a variable number of data tables together - there's a financial metric in monthly tables that I need to add up on, say a YTD basis.. so in March would need to add the value of money_value for any given account in tables TABLE_2018_01, TABLE_2018_02 and TABLE_2018_03... but in September would need to also add TABLE_2018_04, 2018_05, etc through to TABLE_2018_09.  I can manually append these tables together, total the value by account of money_value, but want to automate this for flexible time periods. 

 

I've set up prompts to allow a user to select their end month, and have set up a macro variable to calculate the number of months between the start of year and whatever the end up is of the user - this all works for the purposes of various other data elements that I don't have to pull from individual tables, 

 

However, having trouble creating a loop that could allow appending a non-static number of tables together based on the interval between the user defined end date and the start date. 

 

<prompts runs, get user to select a month saved to %extract_month_end>

 

/* generate macro variables */

%let interval                   = %sysfunc(intck(month,"31DEC2017"d,"&extract_month_end"d));

run;

  

%macro generate_data;

  

data first_table;

set table_2018_01;

run;

 

%let i=0;

%DO %UNTIL (&i=&interval)

     %let i=i&+1

     %let ym_seq = %sysfunc(intnx(month,"&extract_month_end"d,+i,e),yymmn6.);

     %let year_month_seq = %sysfunc(substr(&ym_seq,1,4))_%sysfunc(substr(ym_seq,5));

     proc append base=first_table

                     data=TABLE_year_month_seq;

 %end;

 

<numerous other things which are working fine>

%mend;

 

Any advice here would be appreciated!

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Instead of building N proc appends, make a space-separated list of N dataset names to put in a SET statement, producing something like.  If N=9 it would be:

 

data want;

  set

       table_2018_01

       table_2018_02

       ...

       table_2018_09

      ;

run;

 

  %let interval = %sysfunc(intck(month,"31DEC2017"d,"&extract_month_end"d));

%macro generate_data;
  %let dslist=;
  %do i=1 %to &interval;
    %let yyyy_mm=%sysfunc(intnx(month,"31dec2017"d,&i,e),yymm7.);
    %let yyyy_mm=%sysfunc(tranwrd(&yyyy_mm,M,_)); /*Change 2018M02 to 2018_02*/
    %let dslist=&dslist TABLE_&yyyy_mm;
  %end;
 
  data want;
    set &dslist ;
  run;
     <numerous other things which are working fine>
%mend;


options mprint;
%generate_data;

  
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

6 REPLIES 6
mkeintz
PROC Star

Instead of building N proc appends, make a space-separated list of N dataset names to put in a SET statement, producing something like.  If N=9 it would be:

 

data want;

  set

       table_2018_01

       table_2018_02

       ...

       table_2018_09

      ;

run;

 

  %let interval = %sysfunc(intck(month,"31DEC2017"d,"&extract_month_end"d));

%macro generate_data;
  %let dslist=;
  %do i=1 %to &interval;
    %let yyyy_mm=%sysfunc(intnx(month,"31dec2017"d,&i,e),yymm7.);
    %let yyyy_mm=%sysfunc(tranwrd(&yyyy_mm,M,_)); /*Change 2018M02 to 2018_02*/
    %let dslist=&dslist TABLE_&yyyy_mm;
  %end;
 
  data want;
    set &dslist ;
  run;
     <numerous other things which are working fine>
%mend;


options mprint;
%generate_data;

  
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Oligolas
Barite | Level 11

Hi you could use the SAS metadata to Loop over the requested datasets.

%macro append(dsn);
proc append base=alldata data=&dsn;
run;
%mend append;

proc format; 
  picture x_yyyy_mm (default=7)
    low - high = '%Y_%0m' (datatype=date)
  ;
run; 

%let start=30NOV2017;
%let end=28FEB2018;

data _NULL_;
   set sashelp.vtable;
   where libname eq 'WORK' 
   and index(memname,'TABLE_') 
   and substr(memname,7) gt put("&start."d,x_yyyy_mm.) 
   and substr(memname,7) le put("&end."d,x_yyyy_mm.);
   call execute('%nrstr(%append('||strip(libname)||'.'||strip(memname)||'))');
run;
________________________

- Cheers -

nvd
Calcite | Level 5 nvd
Calcite | Level 5

Thanks - worked like a charm. 

Oligolas
Barite | Level 11

The problem I see with your 'interval' method is that you append as many tables as you need without checking how many tables you actually have in your 'interval'.
If you ever miss a table in your interval, you will erroneously append the table of the following month(!)
One other thing is that it crashes if you ever try to calculate within years. (ie. from Aug2016 to Feb2017)
And one last thing is that proc append was more efficient and warns in case one of your table do not match the database structure of the first table.
I would recommend a defensive approach especially in the financial sector.

________________________

- Cheers -

mkeintz
PROC Star

@Oligolas wrote:

The problem I see with your 'interval' method is that you append as many tables as you need without checking how many tables you actually have in your 'interval'.
If you ever miss a table in your interval, you will erroneously append the table of the following month(!)
One other thing is that it crashes if you ever try to calculate within years. (ie. from Aug2016 to Feb2017)
And one last thing is that proc append was more efficient and warns in case one of your table do not match the database structure of the first table.
I would recommend a defensive approach especially in the financial sector.


 

Good point on defending against missing tables, but that can be easily addressed within the interval approach, by testing for existence while building the DSLIST macrovar.  This code skips absent months:

 

%macro generate_data;
  %let dslist=;
  %do i=1 %to &interval;
    %let yyyy_mm=%sysfunc(intnx(month,"31dec2017"d,&i,e),yymm7.);
    %let yyyy_mm=%sysfunc(tranwrd(&yyyy_mm,M,_)); /*Change 2018M02 to 2018_02*/
    %if %sysfunc(exist(work.TABLE_&yyyy_mm)) %then
    %let dslist=&dslist TABLE_&yyyy_mm;
  %end;
 
  data want;
    set &dslist ;
  run;
     <numerous other things which are working fine>

%mend;

options mprint;
%let interval=5;
%generate_data;

 

And if the users want to terminate the macro whenever a month is missing, then following the "%if sysfunc..." statement with

   %else %return;

 

I don't understand the crash prediction "if you every try to calculate within years"   (actually I think you mean across years, but I still don't understand the crash prediction).

 

In my experience I haven't encounter a situation in which multiple proc appends is a more efficient approach than multiple arguments to a single SET statement.

 

As to the protection append offers against mixed database structures, if the OP's production environment is like those I have seen, a collection of dataset names like TABLE_2018_01 TABLE_2018_02, etc. means identical structures. 

 

However, the SET statement will stop the data step when a variable is numeric in some datasets and character in others.  And in addition you can add the "open=defer" option to the SET statement, as in:

    set &dslist  open=defer;

which will (1) force the output dataset to only get variables in the first name in &dslist, and (2) issue warnings on the log when any dataset has a differing structure.   It also saves memory, since it tells SAS to re-user the same input buffer for each dataset rather than set up a buffer for each dataset.

 

Yes, I agree with the idea of a defensive approach, but I'm not sure the 'interval' method is intrinsically less defensive.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Oligolas
Barite | Level 11

Hi,

maybe the approaches and experiences between the finance and pharmaceutical sector are different.
With 'calculation within years' I meant starting from a month other than december 2017 let's say November 2017 (it raises an intnx error) but maybe it is not relevant because calculations are only done within a year. nvm.

________________________

- Cheers -

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1629 views
  • 1 like
  • 3 in conversation