DATA Step, Macro, Functions and more

Macro to combine multiple datasets with different names

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 6
Accepted Solution

Macro to combine multiple datasets with different names

Hi there, SAS newbie here. I have a SAS library of 40+ files the following dataset name pattern:

 

McDonalds

McDonaldsFile2

BurgerKing

Wendys

WendysFile2

WendysFile3

PizzaHut

 

All of these files share the same column names. So some companies have just one file, whereas others have more than one file. I am trying to write a macro to combine each company’s files into one dataset. So McDonalds = McDonalds + McDonaldsFile2, and Wendys = Wendys + WendysFile2 + WendysFile3. The other companies with just one file should stay intact.

 

Here is the code I’ve tried:

proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;

proc sort data=abccont nodupkey;
by memname;
run;

data _null_;
set abccont end=last;
by memname;
i+1;
call symputx('name'||trim(left(put(i,8.))),memname);
if last then call symputx('count',i);
run;

%macro combine;
         %do n = 1 %to &count;
               data &memname;
                              set
                              %do j = 2 %to 3;
                                  cats(&memname,’File’,&i);
                              %end;
               run;
%mend combine;

%combine;

I have spent hours searching the forums and google, but to no avail. Any help would be appreciated.

 


Accepted Solutions
Solution
‎02-06-2018 01:01 PM
Occasional Contributor
Posts: 6

Re: Macro to combine multiple datasets with different names

[ Edited ]

thank you for your solutions Tom and mkeintz. I was able to get my solution by tweaking some of Astounding's code:

 

 

proc sql;
  create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';
  ** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;
quit;

data dataset_list;
   set dataset_list;
   position = length(memname) - 4;
   if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);
   else root = memname;
run;

 
data _null_;
   set dataset_list;
   by root;
   if first.root=0 then do;
      call execute('proc append data=staging.' || memname || ' base = staging.' || root || ' force; run;' ) ;
   end;
run;

 

View solution in original post


All Replies
Super User
Posts: 23,776

Re: Macro to combine multiple datasets with different names

[ Edited ]

If they all have the same variables this approach will work - no macros needed:

 

proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;

proc sort data=abccont nodupkey;
by memname;
run;

proc sql noprint;
select catx(".", 'abc', memname) into :dsn_list separated by " " from abccont;
quit;

data combined_data;
set &dsn_list;
run;

Replace ABC in the CATX with the libname you actually have and this should work

 


sasRus wrote:

Hi there, SAS newbie here. I have a SAS library of 40+ files the following dataset name pattern:

 

McDonalds

McDonaldsFile2

BurgerKing

Wendys

WendysFile2

WendysFile3

PizzaHut

 

All of these files share the same column names. So some companies have just one file, whereas others have more than one file. I am trying to write a macro to combine each company’s files into one dataset. So McDonalds = McDonalds + McDonaldsFile2, and Wendys = Wendys + WendysFile2 + WendysFile3. The other companies with just one file should stay intact.

 

Here is the code I’ve tried:

proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;

proc sort data=abccont nodupkey;
by memname;
run;

data _null_;
set abccont end=last;
by memname;
i+1;
call symputx('name'||trim(left(put(i,8.))),memname);
if last then call symputx('count',i);
run;

%macro combine;
         %do n = 1 %to &count;
               data &memname;
                              set
                              %do j = 2 %to 3;
                                  cats(&memname,’File’,&i);
                              %end;
               run;
%mend combine;

%combine;

I have spent hours searching the forums and google, but to no avail. Any help would be appreciated.

 


 

Occasional Contributor
Posts: 6

Re: Macro to combine multiple datasets with different names

hi Reeza, thank you for your response. I'm not trying to append datasets together, just the ones for each company.
Super User
Posts: 23,776

Re: Macro to combine multiple datasets with different names

Ok...similar approach but there’s no real way to know if Wendy’s and McDonalds are different. 

 

I guess you you could convert my previous code into a macro and add a LIKE condition to selectively group the names based on the first 5 or X characters. 

Super User
Posts: 6,785

Re: Macro to combine multiple datasets with different names

[ Edited ]

I think what you are asking for is a set of SAS data sets, not a single SAS data set holding all the data.  Under that assumption, here is an approach.  It's untested code, so may need some tweaking.

 

proc sql;

  create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';

  ** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;

quit;

 

data dataset_list;

   set dataset_list;

   position = length(memname) - 4;

   if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);

   else root = memname;

run;

 

data _null_;

   set dataset_list;

   by root;

   if first.root=0 then do;

      call execute('proc append data=abc.' || memname || ' base = abc.' || root || '; run;') ) ;

   end;

run;

 

Make sure your data is backed up (or replaceable) since this replaces permanent data sets, but uses untested code.

 

 

Occasional Contributor
Posts: 6

Re: Macro to combine multiple datasets with different names

Posted in reply to Astounding
thanks Astounding. I'll try this code. will post an update later.
Super User
Super User
Posts: 8,126

Re: Macro to combine multiple datasets with different names

You don't seem to understand the difference between macro code and the SAS code that you are trying to use it to generate.

For example it looks like you tried to stick a data step function call into the middle of a SET statement.

So macro code like this:

data &memname;
  set
%do j = 2 %to 3;
  cats(&memname,’File’,&i);
%end;
run;

So if the macro variable MEMNAME had the value WENDYS it would try to generate SAS code that looked like this:

data WENDYS;
  set
  cats(WENDYS,’File’,2);
  cats(WENDYS,’File’,3);
run;

So you are telling SAS to set a dataset named CATS and then are adding an invalid set of data set options.  Then you are calling the CATS() function, but you have not used in a valid SAS statement since there is no variable that you are assigning the result into.  Also the variable WENDYS that the CATS() function is referencing has not been defined.

 

Don't try to use macro code to generate code until you know what code you want to generate.

If you want to create a new dataset from every dataset that starts with a specific prefix then just use the new colon modifier on a set statement.

 

data work.wendys;
  set abc.wendys: ;
run;

But you could get in trouble if some full names match the beginning of other names.

If your pattern is name the groups as X, XFILE2, XFILE3, .... then perhaps you have some hope of automating the process.

But really for 40 files (so what 13 to 30 actual groups?) it is probably going to be faster and easier just to copy the simple dataset above and edit it by hand.

 

 

 

 

 

Trusted Advisor
Posts: 1,345

Re: Macro to combine multiple datasets with different names

You want a create a program that looks like this:

 

data newlib.mcdonalds;
  set oldlib.mcdonalds:  open=defer;
run;
data newlib.wendys;
  set oldlib.wendys:  open=defer;
run;
  1. The trailing colon tells sas you are using a name root, you want all datasets whose names begin with the text preceding the colon.
  2. The "open=defer"  tells SAS to re-use one buffer for each incoming data set, instead of making a buffer for each.  But it assumes that all the data sets have the same variables.

To construct such a program:

data _null_;
  set memlist;
  root=prxchange('s/(\w+)FILE\d+\s/$1/',1,upcase(memname));
  if root ^= lag(root) then 
    call execute(catx(' ','DATA',root,';SET',cats(root,': open=defer;run;')));
run;

If the perl regular expression (prx) function is too opaque, you can generate the root value as below:

data _null_;
  set memlist;
  root=upcase(memname);
  do while ('0'<=char(root,length(root))<='9');
    root=substr(root,1,length(root)-1);
  end;
  root=transtrn(root,'FILE ','     ');
  if root ^= lag(root) then 
    call execute(catx(' ','DATA',root,';SET',cats(root,': open=defer;run;')));
run;
Solution
‎02-06-2018 01:01 PM
Occasional Contributor
Posts: 6

Re: Macro to combine multiple datasets with different names

[ Edited ]

thank you for your solutions Tom and mkeintz. I was able to get my solution by tweaking some of Astounding's code:

 

 

proc sql;
  create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';
  ** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;
quit;

data dataset_list;
   set dataset_list;
   position = length(memname) - 4;
   if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);
   else root = memname;
run;

 
data _null_;
   set dataset_list;
   by root;
   if first.root=0 then do;
      call execute('proc append data=staging.' || memname || ' base = staging.' || root || ' force; run;' ) ;
   end;
run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 293 views
  • 0 likes
  • 5 in conversation