Dear All,
I have to pull data from several datasets into one. The following code works perfectly:
Data Final;
Set
D3 D4 D5 D6 D7 D8 D9 D10;
As the number of datasets changes each time, I wonder if there is a way to call them in one step (similar to Array D3-D10).
If it is not possible, is there a way to use a macro...?
Regards,
JAR
I think I read that is possible in 9.3
Of course, even without 9.3, you could always use something like:
data d1;
x=1;
output;
run;
data d2;
x=2;
output;
run;
data all;
set d:;
run;
And, while I had never tried it, it works in 9.2 as well:
data all;
set d1-d2;
run;
What I must have read is the new ability to do the same thing in the data statement itself.
I am using learner's edition of Enterprise Guide. The engine is still 9.1, your code does not work in it:
data all;
set d1-d2;
run;
Regards,
JAR
You could always approximate it using a combination of proc sql and a datastep. E.g.,:
proc sql noprint;
select memname into : files
separated by " "
from dictionary.tables
where libname="WORK" and
memname like 'D%'
;
quit;
data want;
set &files.;
run;
This should work as well:
%macro combine;
data final;
set
%do i=3 %to 10;
d&i
%end;
;
run;
%mend;
%combine;
Hi ... as more and more data sets get added, would PROC APPEND be faster for concatenating data sets ...
%macro fakedata;
%do j=1 %to 10;
data d&j;
do j=1 to 1e6;
output;
end;
run;
%end;
%mend;
* make 10 data sets ... d1 through d10;
%fakedata;
data _null_;
do ds=1 to 10;
call execute(catt('proc append base=final data=d',ds,';run;'));
end;
run;
Interestingly, yes, proc append (and/or probably using append in proc datasets) is quite a bit more efficient. I wonder why the same operation uses a different algorithm in a datastep. There shouldn't be any need to re-read each file when appending additional files, but the processing time indicates otherwise.
APPEND is a specialized tool, and that allows a degree of optimization (block operations, etc.). The DATA step is a very flexible thing, but at a cost. It drags all of the data, an observation at a time, through the program data vector. That adds overhead.
I'm pretty sure there's no re-reading. That would have to be deliberately contrived.
Also: OPEN=DEFER may help in the DATA step, if the data sets meet the requirements.
art297 wrote:
Interestingly, yes, proc append (and/or probably using append in proc datasets) is quite a bit more efficient. I wonder why the same operation uses a different algorithm in a datastep. There shouldn't be any need to re-read each file when appending additional files, but the processing time indicates otherwise.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.