Solved: Merge together any number of tables, and rename columns with same vari...

makemo · Posted 04-07-2020 10:47 AM

Hi everybody

What I have and I can do

Here's my issue :
I would like to merge several tables, which are called tab_X. X is a number.
In each table, there are two columns A and B with the same content, except B has X removed rows.

For instance, here is tab_3 :

tab_3

+----+---+---+
| id | A | B |
+----+---+---+
|  1 | a | a |
|  2 | b | b |
|  3 | c | c |
|  4 | d | d |
|  5 | e | e |
|  6 | f | f |
|  7 | g |   |
|  8 | h |   |
|  9 | i |   |
+----+---+---+

Now, I have a macro that can generate any number of such tables as I want.
This means if I execute :

%table_maker (removed_rows = 3 5 )

What I get are two distinct tables, which are "tab_3" and "tab_5".

What I want to do

Now, for further analysis of my Data, I would like that every column B I ever produce are compared to column A in a single table.

This means I would like to merge "tab_3" and "tab_5" so that I get :

tab_all 

+----+---+----+----+
| id | A | B3 | B5 |
+----+---+----+----+
|  1 | a | a  | a  |
|  2 | b | b  | b  |
|  3 | c | c  | c  |
|  4 | d | d  | d  |
|  5 | e | e  |    |
|  6 | f | f  |    |
|  7 | g |    |    |
|  8 | h |    |    |
|  9 | i |    |    |
+----+---+----+----+

Which would thus involve two tricks :

1) I am merging tab_3 with tab_5 to create a new table

(but more generaly, I would like to merge any number of tables I get)

2) I am renaming each B column so that columns with same variable names dont erase each other (preferably in a "B_X" fashion where X is the suffix of my table)

What I have attempted : first code I tried

So far, what I have try is this :

%macro  table_maker (removed_rows =) ; 

/*   precedent code in the macro succesfully creates tables   */

      /* following code is to merge everything with renaming of columns */

            /* we create local macro-variables for the loop */

            %LOCAL index number_of_tables ; 
            %LET number_of_tables = %sysfunc(countw(&removed_rows.))  ; 


             /* we perform the loop itself  */

            %DO index=1  %TO  &number_of_tables. ;
                  %LET  i = %scan(&removed_rows.,&i.) ;
                        DATA  lib.tab_all ; 
                        RENAME B=B_&i. ; 
                        MERGE tab_&i. ; 
                        RUN ; 
            %END ; 

%mend

This code does not work, and what I get in the end is :

tab_all (wrong result)

+----+---+----+
| id | A | B5 |
+----+---+----+
|  1 | a | a  |
|  2 | b | b  |
|  3 | c | c  |
|  4 | d | d  |
|  5 | e |    |
|  6 | f |    |
|  7 | g |    |
|  8 | h |    |
|  9 | i |    |
+----+---+----+

It seems that the last table I merge (here, tab_5) erases the former tables that have been merged before (here tab_3)

What I have attempted : a 2nd code testing

I have try also another way, with SET instead of MERGE :

             /* we perform the loop itself  */

            %DO index=1  %TO  &number_of_tables. ;
                  %LET  i = %scan(&removed_rows.,&i.) ;
                        DATA  lib.tab_all ; 
                        RENAME B=B_&i. ; 
                        SET tab_&i. ; 
                        RUN ; 
            %END ;

But again, what I get is not the union of every table. It keeps only the columns of the last table.

What should I correct in my code so that each table I created merge into a single table with renamed columns ?

I work under SAS Enterprise Guide, and my version of SAS is SAS 9.1.3

Thanks in advance ! Any help would be greatly appreciated.

Tom · Posted 04-07-2020 12:54 PM

This is way too long to read.

As to the point of your %DO loop only keeping the last dataset it seems you have not thought out what SAS code you want to generate before starting to write your macro.

Most likely you want to generate code like this, if iI got the overall goal.

data want;
  merge 
    tab1 (rename=(B=B1))
    tab2 (rename=(B=B2))
  ;
  by id;
run;

So if you want to do that with %DO loops then make sure the loops are generating the OLD=NEW pairs in the middle of the MERGE statement. Don't but the %DO loop around the whole data step.

View solution in original post

Reeza · Posted 04-07-2020 11:30 AM

SAS 9.1.3 is beyond ancient. There's no BY statement in your MERGE. Are you wanting to merge by position and not by ID or A?

Instead of the MERGE approach, I would do an APPEND and then transpose them.

So when you append the data sets add an indicator for the iteration as well. In the example below I use the data set name as the indicator and I assume all the tables you want to merge start with tab and there are no other tables that start with the prefix. This allows you to use the colon operator to append all datasets that start with tab.

No macros needed.

Tutorials covering the topics used in the answer below.

Here is a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

data append_all;
set tab: indsname = source;
dsn = source;
run;

proc sort data=append_all;
by id dsn;
run;

proc transpose data=append_all out=wide_format prefix = B_;
by ID;
id dsn;
run;

@makemo wrote:

Hi everybody

What I have and I can do

Here's my issue :
I would like to merge several tables, which are called tab_X. X is a number.
In each table, there are two columns A and B with the same content, except B has X removed rows.

For instance, here is a neutral table :
+----+---+---+
| id | A | B |
+----+---+---+
|  1 | a | a |
|  2 | b | b |
|  3 | c | c |
|  4 | d | d |
|  5 | e | e |
|  6 | f | f |
|  7 | g | g |
|  8 | h | h |
|  9 | i | i |
+----+---+---+
And here are tab_3 and tab_5 :
tab_3

+----+---+---+
| id | A | B |
+----+---+---+
|  1 | a | a |
|  2 | b | b |
|  3 | c | c |
|  4 | d | d |
|  5 | e | e |
|  6 | f | f |
|  7 | g |   |
|  8 | h |   |
|  9 | i |   |
+----+---+---+


another example : tab_5

+----+---+---+
| id | A | B |
+----+---+---+
|  1 | a | a |
|  2 | b | b |
|  3 | c | c |
|  4 | d | d |
|  5 | e |   |
|  6 | f |   |
|  7 | g |   |
|  8 | h |   |
|  9 | i |   |
+----+---+---+
Now, I have a macro that can generate any number of such tables as I want, each time with a different number of deleted rows in column "B".

This means if I execute :
%table_maker (removed_rows = 3 5 )
What I get are two distinct tables, which are "tab_3" and "tab_5" shown above.

There are no constraints on removed_rows : it can be two tables, or ten, or only a single table, etc...

What I want to do

Now, for further analysis of my Data, I would like that every column B I ever produce are compared to column A in a single table.
This means I would like to merge "tab_3" and "tab_5" so that I get :
tab_all = merging of tab_3 and tab_5

+----+---+----+----+
| id | A | B3 | B5 |
+----+---+----+----+
|  1 | a | a  | a  |
|  2 | b | b  | b  |
|  3 | c | c  | c  |
|  4 | d | d  | d  |
|  5 | e | e  |    |
|  6 | f | f  |    |
|  7 | g |    |    |
|  8 | h |    |    |
|  9 | i |    |    |
+----+---+----+----+
Which would thus involve two tricks :

1) I am merging tab_3 with tab_5 to create a new table

(but more generaly, I would like to merge any number of tables I get. If I happen to deal with tab_3,tab_5,tab_7 and tab_8, I would like to merge all four of these tables)

2) I am renaming each B column so that columns with same variable names dont erase each other (preferably in a "B_X" fashion where X is the suffix of my table)

What I have attempted : first code I tried

So far, what I have try is this :
%macro  table_maker (removed_rows =) ; 

/* the parameter "removed_rows" is a list of numbers like : 3 5 or : 3 5 7 8 9 ) */
              
/*   precedent code in the macro succesfully creates tables. 
For instance if removed_rows = 3 5  then it creates 2 tables with names tab_3 and tab_5.
And columns B have 3 and 5 missing rows   */

      /* following code is to merge everything with renaming of columns */
      /* does not work yet ... */

      /* the following loop is a kind of  "for each" loop  */

            /* we create local macro-variables for the loop */
            /*  these macro-variables tell us how many tables I created, and thus how many I want to merge */

            %LOCAL index number_of_tables ; 
            %LET number_of_tables =%sysfunc(countw(&removed_rows.))  ; 


             /* we perform the loop itself  */

            %DO index=1  %TO  &number_of_tables. ;
                  %LET  i = %scan(&removed_rows.,&i.) ;
                        DATA  lib.tab_all ; 
                        RENAME B=B_&i. ; 
                        MERGE tab_&i. ; 
                        RUN ; 
            %END ; 

%mend 
This code does not work, and what I get in the end is :
tab_all = wrong result

+----+---+----+
| id | A | B5 |
+----+---+----+
|  1 | a | a  |
|  2 | b | b  |
|  3 | c | c  |
|  4 | d | d  |
|  5 | e |    |
|  6 | f |    |
|  7 | g |    |
|  8 | h |    |
|  9 | i |    |
+----+---+----+
That is, my code does not merge all tables together, it only keeps the columns from the last table.
It seems that the last table I merge (here, tab_5) erases the former tables that have been merged before (here tab_3)

What I have attempted : a 2nd code testing

I have try also another way, with SET instead of MERGE, in the hope that SET would have the cumulative effect I am looking for (and not an "Erasing former data" effect) :
             /* we perform the loop itself  */

            %DO index=1  %TO  &number_of_tables. ;
                  %LET  i = %scan(&removed_rows.,&i.) ;
                        DATA  lib.tab_all ; 
                        RENAME B=B_&i. ; 
                        SET tab_&i. ; 
                        RUN ; 
            %END ; 
But again, what I get is not the union of every table. It merely is a table which takes the columns of the last value "i" taken in the DO Loop.

How could I resolve the problem ?
What should I correct in my code so that each table I created merge into a single table with renamed columns, whatever the number of tables I have to merge ?

I work under SAS Enterprise Guide, and my version of SAS is SAS 9.1.3

Thanks in advance ! Any help would be greatly appreciated.

makemo · Posted 04-07-2020 11:49 AM

Hello Reeza,

Thanks for the answer !

Actually I want to merge by ID, but so far my code it has not produced any inconsistency in this regard. That being said, yes, I will add a BY.

I will try your solution ASAP and tell you the result I get.

But I am a little worried about your use of the colon operator. I have been trying to use XXX_ (the colon operator) and XXX: (SAS prefix wildcard) in precedent codes of mine and it seemed it did not work. You pointed out that SAS 9.1.3 is very old. Is it possible that it is too old to make those two things work ?

Best regards,
Makemo

Reeza · Posted 04-07-2020 01:18 PM

Oh, yeah, the colon operator won't work in 9.1.3 but that's fair, since its 16 year old software - teenagers are horrible to deal with.

I don't think the INDSNAME option may exist either in that version (it's in 9.2+) - in that case I would very much suggesting modifying your program that creates the tab dataset to have a unique identifier per data set to avoid this issue. Otherwise you could modify the PROC SQL to generate a whole bunch of IN but it's annoying for sure. Then you also have to be careful for your macro variable lengths. Funny how two small changes can make your code infinitely easier.

proc sql noprint;
select memname into :dset_list separated by " "
where library='WORK' and memname like 'TAB%';
quit;


data append_all;
set dset_list  indsname = source;
dsn = source;
run;

proc sort data=append_all;
by id dsn;
run;

proc transpose data=append_all out=wide_format prefix = B_;
by ID;
id dsn;
run;

makemo · Posted 04-08-2020 04:59 AM

Well, as suggested by Tom, it seems that the only thing I had to do really was to change my code's structure...

That is, moving the place of my DO loop so that it performs on the "MERGE" statement, instead of performing a loop on the whole DATA step.

That being said, what you suggested in both of your answers seems intersting as well, and it will surely help me to improve as a SAS programmer. I don't have time in the next hours, but I want to try it anyway and report on this thread whether it does well or not.

Thanks for your input !

All the best,
Makemo

Tom · Posted 04-07-2020 12:54 PM

This is way too long to read.

As to the point of your %DO loop only keeping the last dataset it seems you have not thought out what SAS code you want to generate before starting to write your macro.

Most likely you want to generate code like this, if iI got the overall goal.

data want;
  merge 
    tab1 (rename=(B=B1))
    tab2 (rename=(B=B2))
  ;
  by id;
run;

So if you want to do that with %DO loops then make sure the loops are generating the OLD=NEW pairs in the middle of the MERGE statement. Don't but the %DO loop around the whole data step.

makemo · Posted 04-08-2020 04:55 AM

I am sorry the question seems so long. If that is still of interest to you, I have shorten my post. The new version is now nearly half as long as what is quoted in @Reeza first post.

All propositions posted here seem intersting, but I have tried your idea (changing the place of the DO loop) and it does indeed solve the issue.

Simply efficient.

Thanks a lot, very useful.

All the best,

Makemo

mkeintz · Posted 04-07-2020 01:50 PM

I admit this post was way too long for me to review, so I'm standing on the shoulders of @Reeza and @Tom to undertand your objective.

I think, if each TAB dataset is sorted by ID/A, you can use a variant of @Reeza 's suggestion using the dictionaries object in proc sql, but I'm go to attempt a shortcut, avoiding the proc sort/PROC TRANSPOSE steps:

proc sql noprint;
  select cats(memname,'(rename=(b=b_',memname,'))') into :renamlist separated by ' '
  from dictionary.tables where libname='WORK' and memname like 'TAB%';
quit;
%put &=renamlist;

data want;
  merge &renamelist;
  by ID a;
run;

This will use the metadata (the dictionary.tables resource) about your TAB datasets to generate a data step with a MERGE statement as per @Tom 's suggestion, but without the hardcoding of the rename parameters.

And if you want variables names B_1, B_2 instead of B_TAB1, B_TAB2, insert a substring function in the 3rd argument of the CATS function:

  select cats(memname,'(rename=(b=b_',substr(memname,4),'))') into :renamlist separated by ' '

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

makemo · Posted 04-08-2020 05:06 AM

Hello Mkeintz

Well first of all sorry for the long post. If that's still of interest to you, I have edited my question to halve its length.

Now it seems also that Tom found a simple trick for my problem (moving the place of my DO loop).

That being said I'm puzzled by how elaborate your proposition look like.
I have not time for it immediatly, but I will try to test this also, might help me getting better at SAS programming.

Thanks a lot, best regards

Tom · Posted 04-07-2020 03:36 PM

I usually find it is better to generate that type of METADATA driven code with a DATA step and not macro code. Then you do not have to worry as much about quoting and are not limited be the maximum length of a macro variable.

So let's make a simplified generic data step merge generator, concentrating on the issue of duplicate names. For now I will make it simpler by assuming all datasets are in the WORK library and always generate the same output dataset.

So first get a list of the variables and how many different dataset the name appears in. Then when that number is larger than 1 you can try to rename it.

%macro merge_all(keylist,dslist);
proc sql noprint;
  create table _vars as 
  select catx('.',libname,memname) as ds length=41 
       , name
       , count(*) as nds
  from dictionary.columns
  where libname='WORK'
    and memname in %qlist(%upcase(&dslist))
    and upcase(name) not in %qlist(%upcase(&keylist))
  group by upcase(name)
  order by ds, name
  ;
quit;
filename code temp;
data _null_;
  set _vars end=eof;
  by ds ;
  file code;
  if _n_=1 then put 'merge';
  dsnum+first.ds;
  if first.ds then put '  ' ds '(rename=(' @;
  if nds > 1 then put name +(-1) '=' name +(-1) '_' dsnum @ ;
  if last.ds then put '))';
  if eof then put ';' ;
run;

data want;
  %include code / nosource2;
  by &keylist;
run;
%mend merge_all;

Note macro %QLIST() is just a simple tool for converting space delimited list into quoted list.

So let's make some sample data:

data x;
 input id A $ C $ @@ ;
cards;
1 a a 2 b b 3 c c 4 d d 5 e e 6 f f 7 g g 8 h h 9 i i 
;
data y ;
 input id a $ B $ @@;
cards;
1 a a 2 b b 3 c c 4 d d 5 e e 6 f f 7 g . 8 h . 9 i . 
;
data z;
 input id A $ B $ @@;
cards;
1 a a 2 b b 3 c c 4 d d 5 e . 6 f . 7 g . 8 h . 9 i . 
;

And try it.

options mprint;
%merge_all(id,x y z);
proc print;
run;

So it generates this log:

MPRINT(MERGE_ALL):   data want;
MPRINT(MERGE_ALL):   merge WORK.X (rename=(A=A_1 )) WORK.Y (rename=(a=a_2 B=B_2 )) WORK.Z (rename=(A=A_3
B=B_3 )) ;
MPRINT(MERGE_ALL):   by id;
MPRINT(MERGE_ALL):   run;

NOTE: There were 9 observations read from the data set WORK.X.
NOTE: There were 9 observations read from the data set WORK.Y.
NOTE: There were 9 observations read from the data set WORK.Z.
NOTE: The data set WORK.WANT has 9 observations and 7 variables.

And this dataset:

Obs    id    A_1    C    a_2    B_2    A_3    B_3

 1      1     a     a     a      a      a      a
 2      2     b     b     b      b      b      b
 3      3     c     c     c      c      c      c
 4      4     d     d     d      d      d      d
 5      5     e     e     e      e      e
 6      6     f     f     f      f      f
 7      7     g     g     g             g
 8      8     h     h     h             h
 9      9     i     i     i             i

makemo · Posted 04-08-2020 01:31 PM

Hello again Tom,

As I said, a previous answer of yours was already sufficient for what I need.

However I'm puzzled about the other propositions, obviously you posted it for a reason.

When I find time I will try the way you suggest here as well, might help me getting better at SAS programmation.

Thanks a lot,
Makemo

Tom · Posted 04-08-2020 01:51 PM

The macro language is a simple text replacement tool. To really work with data (in your case the names of datasets and variables) it is much easier to use the SAS language instead.

makemo · Posted 04-09-2020 04:11 PM

That's what I am realizing lately... but that's frustrating, because that means that macro-programmes, although similar in structures, can not really work similarly as functions in R or Python. Seems to make SAS a language less prone to generalizing/parametre-able/abstract codes, and that's too bad. Or maybe I'm just not good enough yet, and my possibilities are too restricted for now.

Reeza · Posted 04-09-2020 05:20 PM

@makemo wrote:
Seems to make SAS a language less prone to generalizing/parametre-able/abstract codes, and that's too bad.

No, it just requires a different approach. R/Python are very similar in to how they process data but SAS is an entirely different language and you need a different approach.

makemo · Posted 04-10-2020 02:34 AM

Yes, I perfectly understand your point. Certainly I will be able to master SAS's philosopy soon and to make it as flexible as R or Python. I have no choice anyway, since SAS 's approach to other fields (notably econometrics and statistical modelling) is considered deeper/more reliable where I work. Still, it's annoying that statistical languages can have so distinct syntaxes and philosophy for data management.

Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

Re: Merge together any number of tables, and rename columns with same variable names

SAS Innovate 2025: Call for Content

Classroom Training Available!