DATA Step, Macro, Functions and more

Merge all the tables of Library

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 18
Accepted Solution

Merge all the tables of Library

Hello All,

I have some kind of weird requirement, but have to fulfill.

I want to merge all the tables of the library.

1. I don't know how many tables will be available.

2. Table structure for all the tables in library is same.

3. I want to use coalesce function to find out which is the non-missing variable and that value I have to pick up for a variable of output dataset.

4. Yes, key is same.

Can anyone please help at earliest?

Thanks,

Hatshit


Accepted Solutions
Solution
‎06-13-2014 04:53 PM
Super User
Super User
Posts: 7,060

Re: Merge all the tables of Library

Posted in reply to skygold16

How would you know the order to combine the tables?

If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement.  The UPDATE statement works with only two datasets at a time.  So try code like this:

data temp1;

input PR_KEY A B C ;

cards;

1 1 . .

2 . 1 .

3 . . .

run;

data temp2;

input PR_KEY A B C ;

cards;

1 . 2 .

2 . . .

3 . . 2

run;

data temp3;

input PR_KEY A B C ;

cards;

1 . . 3

2 . . .

3 . . .

run;

data want ;

  update temp1 temp2 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

data want ;

  update want temp3 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;


Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated.  Like this:

data want ;

  set temp: ;

  by pr_key ;

run;

data want ;

  update want (obs=0) want ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

View solution in original post


All Replies
Super User
Posts: 19,822

Re: Merge all the tables of Library

Posted in reply to skygold16

Table structure for all the tables is the same? Does this mean that every table has the exact same variables? If so you may be looking for an update process instead of a merge.


Can you provide more info on what you're trying to do.

Occasional Contributor
Posts: 18

Re: Merge all the tables of Library

First of all thanks guys for showing your interest. Smiley Happy

Yes, All tables has the same structure having same name for all variable.

Only difference is that if variable "A" in dataser temp1 has value then in all other datasets, it will be blank.

So when I use coalesce function, I get only value form the dataser where value is not missing for variable A in the final dataset.

For exx:

Table Temp_1

PR_KEY     A     B     C

1               1       .       .

2               .        1      .

3               .        .       .

Table Temp_2

PR_KEY     A     B     C

1                 .     1     .  

2                 1     .     1  

3                  1     1     1

So above is the structure  of the table and how it looks like.

I have a key "PR_KEY" by which I want to merge all the tables of one library.

Use coalesce function to get the value for each variable in final single output dataset i.e. find non missing value for variable.

Here order will not matter because if value for variable "A" is non-missing in temp_1 dataset then, value for variable "A" will be blank in rest all datasets.

Super User
Posts: 19,822

Re: Merge all the tables of Library

Posted in reply to skygold16

Take a look at the sashelp.vtable or dictionary.table. This will list all the tables in your library to help you get started.

I'd probably be lazy and append them all, especially if they had a naming convention and then take the max for each column.

Say my tables were named table1 table2 table3.

data want;

set table: ;*note the colon;

run;

proc sql;

create table want2 as

select pr_key, max(a) as a, max(b) as b, max(c) as c

from want

group by pr_key;

quit;

Occasional Contributor
Posts: 18

Re: Merge all the tables of Library

Your idea sounds very interesting, but i see one challenge here.

I can append all the table and then use max to find out the non-missing values for each columns, but how about the primary key column value.

The value in primary key can be there in all tables. so if say one observation with account number 123 is available in table A and it is possible that account number 123 is also available in table B.

So now when I use max for finding out the max value for each varible, I may get some unexpected data for each account number.


Am I understanding it correct?

Super User
Posts: 19,822

Re: Merge all the tables of Library

Posted in reply to skygold16

No, because you'll find the max by primary key.

Try it.

Occasional Contributor
Posts: 18

Re: Merge all the tables of Library

I am not sure how it will work. :smileyconfused:

When we append all the tables in single table, the concept of primary is over.

Once you append all the records,you have multiple records for say same accounts.

it will look something like below:

1     xyz      1     .     .    

2     abc     .     1     .    

3     jkl        .     .       1

1     xyz     .     1     .

2     abc     1     .     .    

3     jkl       1     .     .

Super Contributor
Posts: 297

Re: Merge all the tables of Library

Posted in reply to skygold16

The group by clause will ensure that the key remains unique and the MAX function will assign the highest value in the variable for each key within the grouping.

So in your case

1     xyz      1     .     .   

1     xyz     .      1     .


would result in


1     xyz     1      1     .


Try it first and you will see that Reeza is correct.

Solution
‎06-13-2014 04:53 PM
Super User
Super User
Posts: 7,060

Re: Merge all the tables of Library

Posted in reply to skygold16

How would you know the order to combine the tables?

If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement.  The UPDATE statement works with only two datasets at a time.  So try code like this:

data temp1;

input PR_KEY A B C ;

cards;

1 1 . .

2 . 1 .

3 . . .

run;

data temp2;

input PR_KEY A B C ;

cards;

1 . 2 .

2 . . .

3 . . 2

run;

data temp3;

input PR_KEY A B C ;

cards;

1 . . 3

2 . . .

3 . . .

run;

data want ;

  update temp1 temp2 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

data want ;

  update want temp3 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;


Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated.  Like this:

data want ;

  set temp: ;

  by pr_key ;

run;

data want ;

  update want (obs=0) want ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

Frequent Contributor
Posts: 101

Re: Merge all the tables of Library

Posted in reply to skygold16

Please provide more clarity on #3. Does the order of the variables matter when determining the first non-missing value?

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 512 views
  • 7 likes
  • 5 in conversation