- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have a number of datasets (400 of them) which I need to perform some manipulation and calculation. I cannot merge them because each dataset is quite big. how can I efficiently loop through these datasets? Is there a way to create a list, or a dataset that contains these datasets and loop through them?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Wrap your analysis into a macro with the dataset name as parameter, put your dataset names into a control dataset, and use call execute() in a data step you run from the control dataset to call the macro:
%macro mymac(dataset);
/* your analysis code, use &dataset */
%mend;
data _null_;
set my_datasets;
call execute('%nrstr(%mymac(' !! strip(dataset_name) !! '))');
run;
dataset_name being a variable in my_datasets that contains the dataset names.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. My macro that do the manipulation and calculation takes 2 parameters: library and dataset. How do I edit the following code to do so? I tried but does not work
data _null_;
set my_datasets;
call execute('%nrstr(%mymac('lib_name, !! strip(dataset_name) !! '))');
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Use the strip function to get rid of unwanted blanks, and the concatenate operator !!:
call execute('%nrstr(%mymac(' !! strip(lib_name) !! ',' !! strip(dataset_name) !! '))');
You can also build the string for call execute with the cats() function.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm sure there have been many previous postings about using DICTIONARY.TABLES and macro processing to generate a list and then cycle through the list to do a PROC PRINT on each table.
cynthia
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do your datasets:
- Share a name structure (like trades_20170115 trades_20170116 ....., or sales_ibm, sales_msft, sales_apple)?
- Have identical variables in them?
If so, and if you would prefer to process all of the datasets at one time (say you want to do a proc means on all trades by date), you can
- Utiliize a data set view (as opposed to a data set file), in which:
- You can conveniently refer to all (or a major subgroup) of the datasets
- Avoid the excessive memory demands of reading in dozens or hundreds of datasets
For instance:
data vneed / view=vneed;
set mylib.trades_2017: open=defer;
trade_value=price*shares;
run;
proc means data=vneed n min max mean nway sum;
var volume price trade_value;
class date stock;
run;
The data step creates a data set view named VNEED. It's a view, not a file, so it doesn't write data to disk, and it takes only a second to compile. Instead the data step gets activated only when VNEED is referred to later, and then the generated data is streamed directly to the calling procedure in memory, saving disk activity and time.
So the above is a compact way to get data for each date*stock combination from all the datasets for 2017. Note the trailing colon in ":set mylib.trades_2017:", which means all the datasets whose names start with trades_2017. The "open=defer" option tells SAS to avoid making an input buffer for each of the incoming datasets. Instead SAS will re-use the same buffer for each dataset in sequence, saving memory. The downsides: (1) each of the datasets should have the same variables, and (2) you can't use a BY statement (which would need to look ahead in each dataset, thereby requiring buffers for each).
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set
Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets
--------------------------