DATA Step, Macro, Functions and more

do loop for datasets

Reply
Contributor
Posts: 65

do loop for datasets

I have a number of datasets (400 of them) which I need to perform some manipulation and calculation. I cannot merge them because each dataset is quite big. how can I efficiently loop through these datasets? Is there a way to create a list, or a dataset that contains these datasets and loop through them?

Super User
Posts: 10,623

Re: do loop for datasets

Posted in reply to ducman1611

Wrap your analysis into a macro with the dataset name as parameter, put your dataset names into a control dataset, and use call execute() in a data step you run from the control dataset to call the macro:

%macro mymac(dataset);
/* your analysis code, use &dataset */
%mend;

data _null_;
set my_datasets;
call execute('%nrstr(%mymac(' !! strip(dataset_name) !! '))');
run;

dataset_name being a variable in my_datasets that contains the dataset names.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Contributor
Posts: 65

Re: do loop for datasets

[ Edited ]
Posted in reply to KurtBremser

Thanks. My macro that do the manipulation and calculation takes 2 parameters: library and dataset. How do I edit the following code to do so? I tried but does not work

data _null_;
set my_datasets;
call execute('%nrstr(%mymac('lib_name, !! strip(dataset_name) !! '))');
run;

 

Super User
Posts: 10,623

Re: do loop for datasets

Posted in reply to ducman1611

Use the strip function to get rid of unwanted blanks, and the concatenate operator !!:

call execute('%nrstr(%mymac(' !! strip(lib_name) !! ',' !! strip(dataset_name) !! '))');

You can also build the string for call execute with the cats() function.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
SAS Super FREQ
Posts: 9,434

Re: do loop for datasets

Posted in reply to ducman1611
Hi:
I'm sure there have been many previous postings about using DICTIONARY.TABLES and macro processing to generate a list and then cycle through the list to do a PROC PRINT on each table.

cynthia
Trusted Advisor
Posts: 1,400

Re: do loop for datasets

Posted in reply to ducman1611

Do your datasets:

  1. Share a name structure  (like trades_20170115   trades_20170116  .....,   or  sales_ibm, sales_msft, sales_apple)?
  2. Have identical variables in them?

 

If so, and if you would prefer to process all of the datasets at one time (say you want to do a proc means on all trades by date), you can

  1. Utiliize a data set view   (as opposed to a data set file), in which:
    1. You can conveniently refer to all (or a major subgroup) of the datasets
    2. Avoid the excessive memory demands of reading in dozens or hundreds of datasets

 

For instance:

data vneed / view=vneed;
  set mylib.trades_2017:   open=defer;
  trade_value=price*shares;
run;
proc means data=vneed n min max mean nway sum;
  var volume price trade_value;
  class date stock;
run;

 

The data step creates a data set view named VNEED.  It's a view, not a file, so it doesn't write data to disk, and it takes only a second to compile.  Instead the data step gets activated only when VNEED is referred to later, and then the generated data is streamed directly to the calling procedure in memory, saving disk activity and time.

 

So the above is a compact way to get data for each date*stock combination from all the datasets for 2017.  Note the trailing colon in ":set mylib.trades_2017:", which means all the datasets whose names start with trades_2017.  The "open=defer" option tells SAS to avoid making an input buffer for each of the incoming datasets.  Instead SAS will re-use the same buffer for each dataset in sequence, saving memory.  The downsides: (1) each of the datasets should have the same variables, and (2) you can't use a BY statement (which would need to look ahead in each dataset, thereby requiring buffers for each).

 

 

Ask a Question
Discussion stats
  • 5 replies
  • 97 views
  • 5 likes
  • 4 in conversation