BookmarkSubscribeRSS Feed
somebody
Lapis Lazuli | Level 10

I have a number of datasets (400 of them) which I need to perform some manipulation and calculation. I cannot merge them because each dataset is quite big. how can I efficiently loop through these datasets? Is there a way to create a list, or a dataset that contains these datasets and loop through them?

5 REPLIES 5
Kurt_Bremser
Super User

Wrap your analysis into a macro with the dataset name as parameter, put your dataset names into a control dataset, and use call execute() in a data step you run from the control dataset to call the macro:

%macro mymac(dataset);
/* your analysis code, use &dataset */
%mend;

data _null_;
set my_datasets;
call execute('%nrstr(%mymac(' !! strip(dataset_name) !! '))');
run;

dataset_name being a variable in my_datasets that contains the dataset names.

somebody
Lapis Lazuli | Level 10

Thanks. My macro that do the manipulation and calculation takes 2 parameters: library and dataset. How do I edit the following code to do so? I tried but does not work

data _null_;
set my_datasets;
call execute('%nrstr(%mymac('lib_name, !! strip(dataset_name) !! '))');
run;

 

Kurt_Bremser
Super User

Use the strip function to get rid of unwanted blanks, and the concatenate operator !!:

call execute('%nrstr(%mymac(' !! strip(lib_name) !! ',' !! strip(dataset_name) !! '))');

You can also build the string for call execute with the cats() function.

Cynthia_sas
SAS Super FREQ
Hi:
I'm sure there have been many previous postings about using DICTIONARY.TABLES and macro processing to generate a list and then cycle through the list to do a PROC PRINT on each table.

cynthia
mkeintz
PROC Star

Do your datasets:

  1. Share a name structure  (like trades_20170115   trades_20170116  .....,   or  sales_ibm, sales_msft, sales_apple)?
  2. Have identical variables in them?

 

If so, and if you would prefer to process all of the datasets at one time (say you want to do a proc means on all trades by date), you can

  1. Utiliize a data set view   (as opposed to a data set file), in which:
    1. You can conveniently refer to all (or a major subgroup) of the datasets
    2. Avoid the excessive memory demands of reading in dozens or hundreds of datasets

 

For instance:

data vneed / view=vneed;
  set mylib.trades_2017:   open=defer;
  trade_value=price*shares;
run;
proc means data=vneed n min max mean nway sum;
  var volume price trade_value;
  class date stock;
run;

 

The data step creates a data set view named VNEED.  It's a view, not a file, so it doesn't write data to disk, and it takes only a second to compile.  Instead the data step gets activated only when VNEED is referred to later, and then the generated data is streamed directly to the calling procedure in memory, saving disk activity and time.

 

So the above is a compact way to get data for each date*stock combination from all the datasets for 2017.  Note the trailing colon in ":set mylib.trades_2017:", which means all the datasets whose names start with trades_2017.  The "open=defer" option tells SAS to avoid making an input buffer for each of the incoming datasets.  Instead SAS will re-use the same buffer for each dataset in sequence, saving memory.  The downsides: (1) each of the datasets should have the same variables, and (2) you can't use a BY statement (which would need to look ahead in each dataset, thereby requiring buffers for each).

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 805 views
  • 5 likes
  • 4 in conversation