DATA Step, Macro, Functions and more

Calling multiple PROCs from inside DO loops

Accepted Solution Solved
Reply
User
Posts: 1
Accepted Solution

Calling multiple PROCs from inside DO loops

Dear SAS Community, 

 

I'm trying to compile multiple datasets into single superset - the same data for different months for many years (jan2001_data .... dec2016). Multiple SET doesn't work here because I only need subset of variables from original datasets. Thus I created a dataset which I expand using sql union with every dataset:

%macro appendData(month=, year=);
proc sql;
     create table anto.temp1 as
     select
           a.*
     from
           anto.dataaccum a
     union
     select
           b.var1,
           b.var2,
           b.var3
     from
           data&year..&month.&year._data b;
 
data anto.dataaccum;
     set anto.temp1;
run;
%mend;
 
%appendData(month=jan, year=2001);
......
%appendData(month=dec, year=2016);

That would have been much more efficient if I could call %appendData from inside %do loop. However, any use of ARRAY, DO or DO_OVER outside of PROC causes error. Is it possible at all? Or do I call it incorrectly? 

%do_over(values=jan feb mar apr may jun jul aug sep oct nov dec, phrase=%appendData(month=?,year=2016));

WARNING: Apparent invocation of macro DO_OVER not resolved.

ERROR 180-322: Statement is not valid or it is used out of proper order.

Accepted Solutions
Solution
‎05-26-2017 04:18 AM
Respected Advisor
Posts: 3,890

Re: Calling multiple PROCs from inside DO loops

[ Edited ]

@pAckmAn

Instead of using macro coding you could also query the SAS dictionary tables and create and populate a macro variable with all the source datasets you need (and which exist).

 

"Multiple SET doesn't work here because I only need subset of variables from original datasets."

That works: You can use a KEEP option on source datasets: SET mysource(keep=<variable list>);

And to read from multiple datasets only use a single SET statement and just list all your datasets like:

SET mysource1(keep=<variable list>) mysource2(keep=<variable list>);

 

Here a code option which allows you to create the SET statement dynamically using a start and end date.

The naming convention of your source datasets made things a bit more complicated than necessary - that's the reason for the regular expression and prxmatch() - but things work.

options symbolgen;

/* create sample source data */
data data2017_jan2017_data data2018_feb2018_data; 
  set sashelp.class;
run;


/* set start and end date for monthly datasets */
%let startdt=jan2017;
%let enddt=mar2018;


/* create macro variable with value containing all source datasets in date range */
/* - only datasets which actually exists will be added                           */
%let ds_list=_dummy_;
proc sql noprint;
  select cats(libname,'.',memname,"(keep=name age)") into :ds_list separated by ' '
  from dictionary.tables
  where 
    libname="WORK" 
    and prxmatch('/^data\d{4}_\w{3}\d{4}_data\s*$/oi',memname)=1
    and input(scan(memname,2,'_'),monyy7.) between input("&startdt",monyy7.) and input("&enddt",monyy7.)
    ;
quit;

/* create table want with all selected source datasets */
data want;
  length sourceDS _sourceDS $41;
  set &ds_list indsname=_sourceDS;
  sourceDS=_sourceDS;
run;

 

...  
where libname="WORK"
...

Change libref "WORK" to the libref where your source datasets reside (i.e. ANTO) and things should work (name of libref must be in upper case).

View solution in original post


All Replies
Solution
‎05-26-2017 04:18 AM
Respected Advisor
Posts: 3,890

Re: Calling multiple PROCs from inside DO loops

[ Edited ]

@pAckmAn

Instead of using macro coding you could also query the SAS dictionary tables and create and populate a macro variable with all the source datasets you need (and which exist).

 

"Multiple SET doesn't work here because I only need subset of variables from original datasets."

That works: You can use a KEEP option on source datasets: SET mysource(keep=<variable list>);

And to read from multiple datasets only use a single SET statement and just list all your datasets like:

SET mysource1(keep=<variable list>) mysource2(keep=<variable list>);

 

Here a code option which allows you to create the SET statement dynamically using a start and end date.

The naming convention of your source datasets made things a bit more complicated than necessary - that's the reason for the regular expression and prxmatch() - but things work.

options symbolgen;

/* create sample source data */
data data2017_jan2017_data data2018_feb2018_data; 
  set sashelp.class;
run;


/* set start and end date for monthly datasets */
%let startdt=jan2017;
%let enddt=mar2018;


/* create macro variable with value containing all source datasets in date range */
/* - only datasets which actually exists will be added                           */
%let ds_list=_dummy_;
proc sql noprint;
  select cats(libname,'.',memname,"(keep=name age)") into :ds_list separated by ' '
  from dictionary.tables
  where 
    libname="WORK" 
    and prxmatch('/^data\d{4}_\w{3}\d{4}_data\s*$/oi',memname)=1
    and input(scan(memname,2,'_'),monyy7.) between input("&startdt",monyy7.) and input("&enddt",monyy7.)
    ;
quit;

/* create table want with all selected source datasets */
data want;
  length sourceDS _sourceDS $41;
  set &ds_list indsname=_sourceDS;
  sourceDS=_sourceDS;
run;

 

...  
where libname="WORK"
...

Change libref "WORK" to the libref where your source datasets reside (i.e. ANTO) and things should work (name of libref must be in upper case).

Trusted Advisor
Posts: 1,128

Re: Calling multiple PROCs from inside DO loops

This is an untested code but could work if you try

 

I am in the assumption to use the temp: the dataset name followed with colon in the below step and the same could be kept within the macro.

 

data anto.dataaccum;
     set anto.temp:;
run;

 

could you please test and let me know

Thanks,
Jag
Super User
Posts: 5,082

Re: Calling multiple PROCs from inside DO loops

It's extremely tricky to navigate, if you are trying to pull different months for different years.  But if you are willing to give a single macro call for each year, here is a possibility.  This may not produce what your original program is intended to produce, but it is a good starting point for discussion.  The idea of selecting a.* in your current program worries me so I want to simplify it here:

 

%macro appendData (year=, month_list=);

   %local i next_month;

   %do i=1 %to %sysfunc(countw(&month_list));

      %let next_month = scan(&month_list, &i);

      proc sql;

      select var1 var2 var3 from data&year..&month.&year._data;

      quit;

      proc append data=temp base=auto.dataaccum;

      run;

   %end;

%mend appendData;

 

%appendData (year=2017, month_list=jan feb mar apr may jun)

 

This still doesn't rule out using a SET statement, but I'm trying to change as little as possible in your original program at this point.  You could try to use macro language to generate this final program as well:

 

data auto.dataaccum;

  do until (done1);

      set data2017.jan2017_data (keep=var1 var2 var3) end=done1;

      output;

   end;

  do until (done2);

      set data2017.feb2017_data (keep=var1 var2 var3) end=done2;

      output;

   end;

  do until (done3);

      set data2017.mar2017_data (keep=var1 var2 var3) end=done3;

      output;

   end;

run;

 

It's just not clear at this point whether this produces the outcome you are looking for.

PROC Star
Posts: 7,363

Re: Calling multiple PROCs from inside DO loops

Here is a datastep approach using the call execute function:

data _null_;
  length sendtox $255;
  call execute('data anto.dataaccum; set ');
  year=2000;
  do i=1 to 192; /*total number of months*/
    if mod(i,12) eq 1 then year+1;
    sendtox=catt('data',year,'.',put(mdy(i,1,year),monname3.),year,'_data (keep=var1 var2 var3)');
    call execute(sendtox);
  end;
  call execute(';run;');
run;

Art, CEO, AnalystFinder.com

 

Respected Advisor
Posts: 3,890

Re: Calling multiple PROCs from inside DO loops

[ Edited ]

And just for fun here yet another a bit unusual approach:

 

 

/* create sample source data */
data data2017_jan2017_data data2018_feb2018_data;
  set sashelp.class;
run;

/* set start and end date for monthly datasets */
%let startdt=jan2017;
%let enddt=mar2018;

/* define vars to keep */
%let keepVar=name age height;


/* create DS with source datasets in date range with required naming pattern */
data work.ds(keep=sourceDS &keepVar);
  length sourceDS _sourceDS $41;
  set 
    data:(obs=1) indsname=_sourceDS;

  if
    prxmatch('/^data\d{4}_\w{3}\d{4}_data\s*$/oi',scan(_sourceDS,-1,'.'))=1
    and input("&startdt",monyy7.) <= input(scan(_sourceDS,3,'._'),monyy7.)  <= input("&enddt",monyy7.)
    ;

  sourceDS=_sourceDS;
run;

/* create table want with all selected source datasets */
data want(keep=sourceDS &keepVar);
  set work.ds;

  _dsid=open(catx(' ',sourceDS,"(keep=&keepVar)"),"I");
  call set(_dsid);

  do _nobs=1 by 1;
    _rc=fetchobs(_dsid,_nobs);
    if _rc = 0 then output;
    else leave;
  end;

  _rc=close(_dsid);
run;

 

 

Super User
Posts: 6,936

Re: Calling multiple PROCs from inside DO loops

[ Edited ]

First, make a dataset with dataset names.

Then do

data _null_;
set dataset_names end=done;
if _n_ = 1 then call execute('data want; set ';);
call execute(trim(ds_name) !! ' (keep=var1 var2 var3) ');
if done then call execute('; run;');
run;

 

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 6,936

Re: Calling multiple PROCs from inside DO loops

Of course there is a catch to this: with a sufficient number of datasets to concatenate, one could exceed the maximum length of a single SAS statement (32767 characters).

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Valued Guide
Posts: 797

Re: Calling multiple PROCs from inside DO loops

@pAckmAn

 

In addition to the useful comments sent by others, consider appending the "open=defer" option to the SET statement.  I suspect, that all the data sets you are concatenatiing have the same variables.  If so (and only if so), then instead of SAS setting up a separate input buffer for each incoming data set (the default behavior), you can tell SAS to reuse the same buffer for each data set in turn.  Saves a lot of memory, and probably some cpu time and clock time.

 

I.e. the SET command would look like

    set

       ... list of data sets here ....

      open=defer;

 

 

The tradeoff - you can't have a BY statement accompany the SET statement - because that would require SAS to "look ahead" to the next record (to determine last.by status) - which in turn would require more than one buffer. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 354 views
  • 5 likes
  • 7 in conversation