I have a number of datasets (400 of them) which I need to perform some manipulation and calculation. I cannot merge them because each dataset is quite big. how can I efficiently loop through these datasets? Is there a way to create a list, or a dataset that contains these datasets and loop through them?
Wrap your analysis into a macro with the dataset name as parameter, put your dataset names into a control dataset, and use call execute() in a data step you run from the control dataset to call the macro:
%macro mymac(dataset);
/* your analysis code, use &dataset */
%mend;
data _null_;
set my_datasets;
call execute('%nrstr(%mymac(' !! strip(dataset_name) !! '))');
run;
dataset_name being a variable in my_datasets that contains the dataset names.
Thanks. My macro that do the manipulation and calculation takes 2 parameters: library and dataset. How do I edit the following code to do so? I tried but does not work
data _null_;
set my_datasets;
call execute('%nrstr(%mymac('lib_name, !! strip(dataset_name) !! '))');
run;
Use the strip function to get rid of unwanted blanks, and the concatenate operator !!:
call execute('%nrstr(%mymac(' !! strip(lib_name) !! ',' !! strip(dataset_name) !! '))');
You can also build the string for call execute with the cats() function.
Do your datasets:
If so, and if you would prefer to process all of the datasets at one time (say you want to do a proc means on all trades by date), you can
For instance:
data vneed / view=vneed;
set mylib.trades_2017: open=defer;
trade_value=price*shares;
run;
proc means data=vneed n min max mean nway sum;
var volume price trade_value;
class date stock;
run;
The data step creates a data set view named VNEED. It's a view, not a file, so it doesn't write data to disk, and it takes only a second to compile. Instead the data step gets activated only when VNEED is referred to later, and then the generated data is streamed directly to the calling procedure in memory, saving disk activity and time.
So the above is a compact way to get data for each date*stock combination from all the datasets for 2017. Note the trailing colon in ":set mylib.trades_2017:", which means all the datasets whose names start with trades_2017. The "open=defer" option tells SAS to avoid making an input buffer for each of the incoming datasets. Instead SAS will re-use the same buffer for each dataset in sequence, saving memory. The downsides: (1) each of the datasets should have the same variables, and (2) you can't use a BY statement (which would need to look ahead in each dataset, thereby requiring buffers for each).
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.