Dear All,
I have to pull data from several datasets into one. The following code works perfectly:
Data Final;
Set
D3 D4 D5 D6 D7 D8 D9 D10;
As the number of datasets changes each time, I wonder if there is a way to call them in one step (similar to Array D3-D10).
If it is not possible, is there a way to use a macro...?
Regards,
JAR
I think I read that is possible in 9.3
Of course, even without 9.3, you could always use something like:
data d1;
x=1;
output;
run;
data d2;
x=2;
output;
run;
data all;
set d:;
run;
And, while I had never tried it, it works in 9.2 as well:
data all;
set d1-d2;
run;
What I must have read is the new ability to do the same thing in the data statement itself.
I am using learner's edition of Enterprise Guide. The engine is still 9.1, your code does not work in it:
data all;
set d1-d2;
run;
Regards,
JAR
You could always approximate it using a combination of proc sql and a datastep. E.g.,:
proc sql noprint;
select memname into : files
separated by " "
from dictionary.tables
where libname="WORK" and
memname like 'D%'
;
quit;
data want;
set &files.;
run;
This should work as well:
%macro combine;
data final;
set
%do i=3 %to 10;
d&i
%end;
;
run;
%mend;
%combine;
Hi ... as more and more data sets get added, would PROC APPEND be faster for concatenating data sets ...
%macro fakedata;
%do j=1 %to 10;
data d&j;
do j=1 to 1e6;
output;
end;
run;
%end;
%mend;
* make 10 data sets ... d1 through d10;
%fakedata;
data _null_;
do ds=1 to 10;
call execute(catt('proc append base=final data=d',ds,';run;'));
end;
run;
Interestingly, yes, proc append (and/or probably using append in proc datasets) is quite a bit more efficient. I wonder why the same operation uses a different algorithm in a datastep. There shouldn't be any need to re-read each file when appending additional files, but the processing time indicates otherwise.
APPEND is a specialized tool, and that allows a degree of optimization (block operations, etc.). The DATA step is a very flexible thing, but at a cost. It drags all of the data, an observation at a time, through the program data vector. That adds overhead.
I'm pretty sure there's no re-reading. That would have to be deliberately contrived.
Also: OPEN=DEFER may help in the DATA step, if the data sets meet the requirements.
art297 wrote:
Interestingly, yes, proc append (and/or probably using append in proc datasets) is quite a bit more efficient. I wonder why the same operation uses a different algorithm in a datastep. There shouldn't be any need to re-read each file when appending additional files, but the processing time indicates otherwise.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.