BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sasRus
Fluorite | Level 6

Hi there, SAS newbie here. I have a SAS library of 40+ files the following dataset name pattern:

 

McDonalds

McDonaldsFile2

BurgerKing

Wendys

WendysFile2

WendysFile3

PizzaHut

 

All of these files share the same column names. So some companies have just one file, whereas others have more than one file. I am trying to write a macro to combine each company’s files into one dataset. So McDonalds = McDonalds + McDonaldsFile2, and Wendys = Wendys + WendysFile2 + WendysFile3. The other companies with just one file should stay intact.

 

Here is the code I’ve tried:

proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;

proc sort data=abccont nodupkey;
by memname;
run;

data _null_;
set abccont end=last;
by memname;
i+1;
call symputx('name'||trim(left(put(i,8.))),memname);
if last then call symputx('count',i);
run;

%macro combine;
         %do n = 1 %to &count;
               data &memname;
                              set
                              %do j = 2 %to 3;
                                  cats(&memname,’File’,&i);
                              %end;
               run;
%mend combine;

%combine;

I have spent hours searching the forums and google, but to no avail. Any help would be appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
sasRus
Fluorite | Level 6

thank you for your solutions Tom and mkeintz. I was able to get my solution by tweaking some of Astounding's code:

 

 

proc sql;
  create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';
  ** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;
quit;

data dataset_list;
   set dataset_list;
   position = length(memname) - 4;
   if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);
   else root = memname;
run;

 
data _null_;
   set dataset_list;
   by root;
   if first.root=0 then do;
      call execute('proc append data=staging.' || memname || ' base = staging.' || root || ' force; run;' ) ;
   end;
run;

 

View solution in original post

8 REPLIES 8
Reeza
Super User

If they all have the same variables this approach will work - no macros needed:

 

proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;

proc sort data=abccont nodupkey;
by memname;
run;

proc sql noprint;
select catx(".", 'abc', memname) into :dsn_list separated by " " from abccont;
quit;

data combined_data;
set &dsn_list;
run;

Replace ABC in the CATX with the libname you actually have and this should work

 


@sasRus wrote:

Hi there, SAS newbie here. I have a SAS library of 40+ files the following dataset name pattern:

 

McDonalds

McDonaldsFile2

BurgerKing

Wendys

WendysFile2

WendysFile3

PizzaHut

 

All of these files share the same column names. So some companies have just one file, whereas others have more than one file. I am trying to write a macro to combine each company’s files into one dataset. So McDonalds = McDonalds + McDonaldsFile2, and Wendys = Wendys + WendysFile2 + WendysFile3. The other companies with just one file should stay intact.

 

Here is the code I’ve tried:

proc contents data=abc._all_ out=abccont(keep=memname) noprint;
run;

proc sort data=abccont nodupkey;
by memname;
run;

data _null_;
set abccont end=last;
by memname;
i+1;
call symputx('name'||trim(left(put(i,8.))),memname);
if last then call symputx('count',i);
run;

%macro combine;
         %do n = 1 %to &count;
               data &memname;
                              set
                              %do j = 2 %to 3;
                                  cats(&memname,’File’,&i);
                              %end;
               run;
%mend combine;

%combine;

I have spent hours searching the forums and google, but to no avail. Any help would be appreciated.

 


 

sasRus
Fluorite | Level 6
hi Reeza, thank you for your response. I'm not trying to append datasets together, just the ones for each company.
Reeza
Super User

Ok...similar approach but there’s no real way to know if Wendy’s and McDonalds are different. 

 

I guess you you could convert my previous code into a macro and add a LIKE condition to selectively group the names based on the first 5 or X characters. 

Astounding
PROC Star

I think what you are asking for is a set of SAS data sets, not a single SAS data set holding all the data.  Under that assumption, here is an approach.  It's untested code, so may need some tweaking.

 

proc sql;

  create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';

  ** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;

quit;

 

data dataset_list;

   set dataset_list;

   position = length(memname) - 4;

   if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);

   else root = memname;

run;

 

data _null_;

   set dataset_list;

   by root;

   if first.root=0 then do;

      call execute('proc append data=abc.' || memname || ' base = abc.' || root || '; run;') ) ;

   end;

run;

 

Make sure your data is backed up (or replaceable) since this replaces permanent data sets, but uses untested code.

 

 

sasRus
Fluorite | Level 6
thanks Astounding. I'll try this code. will post an update later.
Tom
Super User Tom
Super User

You don't seem to understand the difference between macro code and the SAS code that you are trying to use it to generate.

For example it looks like you tried to stick a data step function call into the middle of a SET statement.

So macro code like this:

data &memname;
  set
%do j = 2 %to 3;
  cats(&memname,’File’,&i);
%end;
run;

So if the macro variable MEMNAME had the value WENDYS it would try to generate SAS code that looked like this:

data WENDYS;
  set
  cats(WENDYS,’File’,2);
  cats(WENDYS,’File’,3);
run;

So you are telling SAS to set a dataset named CATS and then are adding an invalid set of data set options.  Then you are calling the CATS() function, but you have not used in a valid SAS statement since there is no variable that you are assigning the result into.  Also the variable WENDYS that the CATS() function is referencing has not been defined.

 

Don't try to use macro code to generate code until you know what code you want to generate.

If you want to create a new dataset from every dataset that starts with a specific prefix then just use the new colon modifier on a set statement.

 

data work.wendys;
  set abc.wendys: ;
run;

But you could get in trouble if some full names match the beginning of other names.

If your pattern is name the groups as X, XFILE2, XFILE3, .... then perhaps you have some hope of automating the process.

But really for 40 files (so what 13 to 30 actual groups?) it is probably going to be faster and easier just to copy the simple dataset above and edit it by hand.

 

 

 

 

 

mkeintz
PROC Star

You want a create a program that looks like this:

 

data newlib.mcdonalds;
  set oldlib.mcdonalds:  open=defer;
run;
data newlib.wendys;
  set oldlib.wendys:  open=defer;
run;
  1. The trailing colon tells sas you are using a name root, you want all datasets whose names begin with the text preceding the colon.
  2. The "open=defer"  tells SAS to re-use one buffer for each incoming data set, instead of making a buffer for each.  But it assumes that all the data sets have the same variables.

To construct such a program:

data _null_;
  set memlist;
  root=prxchange('s/(\w+)FILE\d+\s/$1/',1,upcase(memname));
  if root ^= lag(root) then 
    call execute(catx(' ','DATA',root,';SET',cats(root,': open=defer;run;')));
run;

If the perl regular expression (prx) function is too opaque, you can generate the root value as below:

data _null_;
  set memlist;
  root=upcase(memname);
  do while ('0'<=char(root,length(root))<='9');
    root=substr(root,1,length(root)-1);
  end;
  root=transtrn(root,'FILE ','     ');
  if root ^= lag(root) then 
    call execute(catx(' ','DATA',root,';SET',cats(root,': open=defer;run;')));
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sasRus
Fluorite | Level 6

thank you for your solutions Tom and mkeintz. I was able to get my solution by tweaking some of Astounding's code:

 

 

proc sql;
  create table dataset_list as select distinct memname from dictionary.tables where upcase(libname)='ABC';
  ** Because of the use of DISTINCT, the list should be in alphabetical order ... if not, add order by memname clause;
quit;

data dataset_list;
   set dataset_list;
   position = length(memname) - 4;
   if substr(memname, position) =: 'File' then root = substr(memname, 1, position-1);
   else root = memname;
run;

 
data _null_;
   set dataset_list;
   by root;
   if first.root=0 then do;
      call execute('proc append data=staging.' || memname || ' base = staging.' || root || ' force; run;' ) ;
   end;
run;

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 8233 views
  • 1 like
  • 5 in conversation