Hi there, SAS newbie here. I have a SAS library of 40+ files the following dataset name pattern:
McDonalds
McDonaldsFile2
BurgerKing
Wendys
WendysFile2
WendysFile3
PizzaHut
All of these files share the same column names. So some companies have just one file, whereas others have more than one file. I am trying to write a macro to combine each company’s files into one dataset. So McDonalds = McDonalds + McDonaldsFile2, and Wendys = Wendys + WendysFile2 + WendysFile3. The other companies with just one file should stay intact.
Here is the code I’ve tried:
proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;
proc sort data=abccont nodupkey;
by memname;
run;
data _null_;
set abccont end=last;
by memname;
i+1;
call symputx('name'||trim(left(put(i,8.))),memname);
if last then call symputx('count',i);
run;
%macro combine;
%do n = 1 %to &count;
data &memname;
set
%do j = 2 %to 3;
cats(&memname,’File’,&i);
%end;
run;
%mend combine;
%combine;
I have spent hours searching the forums and google, but to no avail. Any help would be appreciated.
thank you for your solutions Tom and mkeintz. I was able to get my solution by tweaking some of Astounding's code:
proc sql;
create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';
** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;
quit;
data dataset_list;
set dataset_list;
position = length(memname) - 4;
if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);
else root = memname;
run;
data _null_;
set dataset_list;
by root;
if first.root=0 then do;
call execute('proc append data=staging.' || memname || ' base = staging.' || root || ' force; run;' ) ;
end;
run;
If they all have the same variables this approach will work - no macros needed:
proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;
proc sort data=abccont nodupkey;
by memname;
run;
proc sql noprint;
select catx(".", 'abc', memname) into :dsn_list separated by " " from abccont;
quit;
data combined_data;
set &dsn_list;
run;
Replace ABC in the CATX with the libname you actually have and this should work
@sasRus wrote:
Hi there, SAS newbie here. I have a SAS library of 40+ files the following dataset name pattern:
McDonalds
McDonaldsFile2
BurgerKing
Wendys
WendysFile2
WendysFile3
PizzaHut
All of these files share the same column names. So some companies have just one file, whereas others have more than one file. I am trying to write a macro to combine each company’s files into one dataset. So McDonalds = McDonalds + McDonaldsFile2, and Wendys = Wendys + WendysFile2 + WendysFile3. The other companies with just one file should stay intact.
Here is the code I’ve tried:
proc contents data=abc._all_ out=abccont(keep=memname) noprint; run; proc sort data=abccont nodupkey; by memname; run; data _null_; set abccont end=last; by memname; i+1; call symputx('name'||trim(left(put(i,8.))),memname); if last then call symputx('count',i); run; %macro combine; %do n = 1 %to &count; data &memname; set %do j = 2 %to 3; cats(&memname,’File’,&i); %end; run; %mend combine; %combine;
I have spent hours searching the forums and google, but to no avail. Any help would be appreciated.
Ok...similar approach but there’s no real way to know if Wendy’s and McDonalds are different.
I guess you you could convert my previous code into a macro and add a LIKE condition to selectively group the names based on the first 5 or X characters.
I think what you are asking for is a set of SAS data sets, not a single SAS data set holding all the data. Under that assumption, here is an approach. It's untested code, so may need some tweaking.
proc sql;
create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';
** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;
quit;
data dataset_list;
set dataset_list;
position = length(memname) - 4;
if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);
else root = memname;
run;
data _null_;
set dataset_list;
by root;
if first.root=0 then do;
call execute('proc append data=abc.' || memname || ' base = abc.' || root || '; run;') ) ;
end;
run;
Make sure your data is backed up (or replaceable) since this replaces permanent data sets, but uses untested code.
You don't seem to understand the difference between macro code and the SAS code that you are trying to use it to generate.
For example it looks like you tried to stick a data step function call into the middle of a SET statement.
So macro code like this:
data &memname;
set
%do j = 2 %to 3;
cats(&memname,’File’,&i);
%end;
run;
So if the macro variable MEMNAME had the value WENDYS it would try to generate SAS code that looked like this:
data WENDYS;
set
cats(WENDYS,’File’,2);
cats(WENDYS,’File’,3);
run;
So you are telling SAS to set a dataset named CATS and then are adding an invalid set of data set options. Then you are calling the CATS() function, but you have not used in a valid SAS statement since there is no variable that you are assigning the result into. Also the variable WENDYS that the CATS() function is referencing has not been defined.
Don't try to use macro code to generate code until you know what code you want to generate.
If you want to create a new dataset from every dataset that starts with a specific prefix then just use the new colon modifier on a set statement.
data work.wendys;
set abc.wendys: ;
run;
But you could get in trouble if some full names match the beginning of other names.
If your pattern is name the groups as X, XFILE2, XFILE3, .... then perhaps you have some hope of automating the process.
But really for 40 files (so what 13 to 30 actual groups?) it is probably going to be faster and easier just to copy the simple dataset above and edit it by hand.
You want a create a program that looks like this:
data newlib.mcdonalds;
set oldlib.mcdonalds: open=defer;
run;
data newlib.wendys;
set oldlib.wendys: open=defer;
run;
To construct such a program:
data _null_;
set memlist;
root=prxchange('s/(\w+)FILE\d+\s/$1/',1,upcase(memname));
if root ^= lag(root) then
call execute(catx(' ','DATA',root,';SET',cats(root,': open=defer;run;')));
run;
If the perl regular expression (prx) function is too opaque, you can generate the root value as below:
data _null_;
set memlist;
root=upcase(memname);
do while ('0'<=char(root,length(root))<='9');
root=substr(root,1,length(root)-1);
end;
root=transtrn(root,'FILE ',' ');
if root ^= lag(root) then
call execute(catx(' ','DATA',root,';SET',cats(root,': open=defer;run;')));
run;
thank you for your solutions Tom and mkeintz. I was able to get my solution by tweaking some of Astounding's code:
proc sql;
create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';
** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;
quit;
data dataset_list;
set dataset_list;
position = length(memname) - 4;
if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);
else root = memname;
run;
data _null_;
set dataset_list;
by root;
if first.root=0 then do;
call execute('proc append data=staging.' || memname || ' base = staging.' || root || ' force; run;' ) ;
end;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.