Hello,
I have dataset like below. I would like to create a macro which can loop from 1 to _n_ to create a serial of new datasets with name "output_x" where x is the row number.
For example :
data output_1;
set raw.ds1;
keep var1;
run;
data output_2;
set raw.ds2;
keep var2;
run;
...
Could anyone guide me on this? Feels like array might be the way to do. I am very new to array concept and could not figure out by myself. Please help.
If there is a better/smarter way other than using array, please also share your thoughts. Thanks.
data have; input LIB $ DS $ VAR $; datalines; RAW DS1 VAR1 RAW DS2 VAR2 WORK DS1 VAR2 WORK DS1 VAR3 WORK TF1 VAR4 RAW FT15 VAR5 ; run;
Seems like a silly thing to do, but here goes...
First step is do figure out what SAS code you want to generate. Your example is very inefficient since it requires re-reading the dataset multiple times. Instead make the data step like this:
data
output_1(keep=var1)
output_2(keep=var2)
....
;
set raw.ds1;
run;
Then figure out what part varies (and where you can get the information needed to generate the varying part).
It sounds like you need the name of the source dataset and the a list of output dataset name and variable name pairs. So perhaps a dataset like:
data driver ;
input source :$41. target :$41. varname :$32. ;
cards;
raw.ds1 work.output_1 var1
raw.ds1 work.output_2 var2
;
Once you have that you can use it to generate the code. Not sure there is any need to use macro code at all in this case as you have all of the information already in data. So just use a data step.
For example by using CALL EXECUTE().
data _null_;
set driver;
by source;
if first.source then call execute('data ');
call execute(cats(target,'(keep=',varname,')'));
if last.source then call execute(catx(' ',';set',source,';run;'));
run;
Which for the driver data above would generate this code:
NOTE: CALL EXECUTE generated line. 1 + data 2 + work.output_1(keep=var1) 3 + work.output_2(keep=var2) 4 + ;set raw.ds1 ;run;
Seems like a silly thing to do, but here goes...
First step is do figure out what SAS code you want to generate. Your example is very inefficient since it requires re-reading the dataset multiple times. Instead make the data step like this:
data
output_1(keep=var1)
output_2(keep=var2)
....
;
set raw.ds1;
run;
Then figure out what part varies (and where you can get the information needed to generate the varying part).
It sounds like you need the name of the source dataset and the a list of output dataset name and variable name pairs. So perhaps a dataset like:
data driver ;
input source :$41. target :$41. varname :$32. ;
cards;
raw.ds1 work.output_1 var1
raw.ds1 work.output_2 var2
;
Once you have that you can use it to generate the code. Not sure there is any need to use macro code at all in this case as you have all of the information already in data. So just use a data step.
For example by using CALL EXECUTE().
data _null_;
set driver;
by source;
if first.source then call execute('data ');
call execute(cats(target,'(keep=',varname,')'));
if last.source then call execute(catx(' ',';set',source,';run;'));
run;
Which for the driver data above would generate this code:
NOTE: CALL EXECUTE generated line. 1 + data 2 + work.output_1(keep=var1) 3 + work.output_2(keep=var2) 4 + ;set raw.ds1 ;run;
@Tom Wow, what a clever way. 👍
Back to my silly way:
if I know I have 6 rows in have, I want to do sth like below:
do i=1 to length(have);
data output_i;
set lib{i}.ds{i};
keep var{i};
run;
How can I do it? I feel confused on how to construct a array and then apply the looping logic. try to get some sense through this exercise.
@stataq wrote:
@Tom Wow, what a clever way. 👍
Back to my silly way:
if I know I have 6 rows in have, I want to do sth like below:
do i=1 to length(have);
data output_i;
set lib{i}.ds{i};
keep var{i};
run;
How can I do it? I feel confused on how to construct a array and then apply the looping logic. try to get some sense through this exercise.
You are confused because what you seem to want to do has nothing to do with how SAS works. In SAS you run data steps to read and write data. You run procedure steps to read in data and produce reports and/or datasets.
There is no ARRAY statement outside of a data step. The code for a data step is fixed (compiled) before it executes. So the idea of doing something like KEEP VAR[i] makes no sense. The KEEP statement wants the literal name of the variable so it knows which of the variables in the data step should be written to the output dataset(s). It needs to know that before hand.
The code I showed use CALL EXECUTE() to generate code that can be stacked up to run after the current data step has run.
You could also use the macro processor to generate code. Say for example you took your list of variable names and put them into a macro variable. You could then write macro logic that pulled out the individual names from the list and used them to generate code which will then be passed onto to SAS to actually run.
So perhaps something like this:
%macro split(source,target,varlist);
%local n i var;
%let n = %sysfunc(countw(&varlist,%str( )));
data
%do i=1 %to &n;
%let var=%scan(&varlist,&i,%str( ));
&target._&i(keep=&var)
%end;
;
set &source;
run;
%mend split;
That you could run like this:
%split(source=raw.ds1,target=output,varlist=var1 var2);
So that it generates code like this:
MPRINT(SPLIT): data output_1(keep=var1) output_2(keep=var2) ; MPRINT(SPLIT): set raw.ds1; MPRINT(SPLIT): run;
@stataq wrote:
@Tom Wow, what a clever way. 👍
Back to my silly way:
if I know I have 6 rows in have, I want to do sth like below:
do i=1 to length(have);
data output_i;
set lib{i}.ds{i};
keep var{i};
run;
How can I do it? I feel confused on how to construct a array and then apply the looping logic. try to get some sense through this exercise.
Remember that a data step is normally already "looping" over the set of observations in the data set. So there is no need for an "array" or a "do loop". Just a simple data step like this:
data new;
set old;
run;
Will loop N times if the the source dataset has N observations. (Actually it will loop N+1 times but the last one stops at the SET statement when it notices that there are no more observations to be read and so never gets to the end where the implicit OUTPUT statement executes so only N observations will be written.)
@stataq wrote:
@Tom Wow, what a clever way. 👍
Back to my silly way:
if I know I have 6 rows in have, I want to do sth like below:
do i=1 to length(have);
data output_i;
set lib{i}.ds{i};
keep var{i};
run;
How can I do it? I feel confused on how to construct a array and then apply the looping logic. try to get some sense through this exercise.
No array.
A data step automatically incrementing means you don't NEED to count or provide a loop.
This writes the code to a SAS program file and then uses %include to execute it:
data _null_; set have; file "path_and_name_of_the_code_file.sas"; length dsn outn $ 41 ; dsn= catx('.',lib,ds); outn = cats('output_',_n_); put 'data ' outn ';' ; put 'set ' dsn '(keep = ' var ');'; put 'run;'; run; %include "path_and_name_of_the_code_file.sas";
Basically a manual "call execute".
This has the advantage of writing the code to a location to document what happened, or to examine before executing.
SAS data step provides an automatic variable _n_ for the number of times the set executes. So with much code it effectively "counts" the rows of data processed. So that can make the sequential numbered output sets.
You can easily test this code for desired output by replacing the File "nameoffile" with File Print; which directs the output the active results destination.
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.