Solved: Re: Merge all the tables of Library

skygold16 · Posted 06-09-2014 06:28 AM

Hello All,

I have some kind of weird requirement, but have to fulfill.

I want to merge all the tables of the library.

1. I don't know how many tables will be available.

2. Table structure for all the tables in library is same.

3. I want to use coalesce function to find out which is the non-missing variable and that value I have to pick up for a variable of output dataset.

4. Yes, key is same.

Can anyone please help at earliest?

Thanks,

Hatshit

Tom · Posted 06-13-2014 04:53 PM

How would you know the order to combine the tables?

If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement. The UPDATE statement works with only two datasets at a time. So try code like this:

data temp1;

input PR_KEY A B C ;

cards;

1 1 . .

2 . 1 .

3 . . .

run;

data temp2;

input PR_KEY A B C ;

cards;

1 . 2 .

2 . . .

3 . . 2

run;

data temp3;

input PR_KEY A B C ;

cards;

1 . . 3

2 . . .

3 . . .

run;

data want ;

update temp1 temp2 ;

by pr_key ;

run;

data _null_;

set want ;

put (pr_key a b c) (3.);

run;

data want ;

update want temp3 ;

by pr_key ;

run;

data _null_;

set want ;

put (pr_key a b c) (3.);

run;

Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated. Like this:

data want ;

set temp: ;

by pr_key ;

run;

data want ;

update want (obs=0) want ;

by pr_key ;

run;

data _null_;

set want ;

put (pr_key a b c) (3.);

run;

View solution in original post

Reeza · Posted 06-09-2014 10:41 AM

Table structure for all the tables is the same? Does this mean that every table has the exact same variables? If so you may be looking for an update process instead of a merge.

Can you provide more info on what you're trying to do.

skygold16 · Posted 06-10-2014 02:10 AM

First of all thanks guys for showing your interest.

Yes, All tables has the same structure having same name for all variable.

Only difference is that if variable "A" in dataser temp1 has value then in all other datasets, it will be blank.

So when I use coalesce function, I get only value form the dataser where value is not missing for variable A in the final dataset.

For exx:

Table Temp_1

PR_KEY A B C

1 1 . .

2 . 1 .

3 . . .

Table Temp_2

PR_KEY A B C

1 . 1 .

2 1 . 1

3 1 1 1

So above is the structure of the table and how it looks like.

I have a key "PR_KEY" by which I want to merge all the tables of one library.

Use coalesce function to get the value for each variable in final single output dataset i.e. find non missing value for variable.

Here order will not matter because if value for variable "A" is non-missing in temp_1 dataset then, value for variable "A" will be blank in rest all datasets.

Reeza · Posted 06-10-2014 10:35 AM

Take a look at the sashelp.vtable or dictionary.table. This will list all the tables in your library to help you get started.

I'd probably be lazy and append them all, especially if they had a naming convention and then take the max for each column.

Say my tables were named table1 table2 table3.

data want;

set table: ;*note the colon;

run;

proc sql;

create table want2 as

select pr_key, max(a) as a, max(b) as b, max(c) as c

from want

group by pr_key;

quit;

skygold16 · Posted 06-11-2014 08:37 AM

Your idea sounds very interesting, but i see one challenge here.

I can append all the table and then use max to find out the non-missing values for each columns, but how about the primary key column value.

The value in primary key can be there in all tables. so if say one observation with account number 123 is available in table A and it is possible that account number 123 is also available in table B.

So now when I use max for finding out the max value for each varible, I may get some unexpected data for each account number.

Am I understanding it correct?

Reeza · Posted 06-11-2014 10:31 AM

No, because you'll find the max by primary key.

Try it.

skygold16 · Posted 06-13-2014 02:06 AM

I am not sure how it will work. :smileyconfused:

When we append all the tables in single table, the concept of primary is over.

Once you append all the records,you have multiple records for say same accounts.

it will look something like below:

1 xyz 1 . .

2 abc . 1 .

3 jkl . . 1

1 xyz . 1 .

2 abc 1 . .

3 jkl 1 . .

Scott_Mitchell · Posted 06-13-2014 05:59 AM

The group by clause will ensure that the key remains unique and the MAX function will assign the highest value in the variable for each key within the grouping.

So in your case

1 xyz 1 . .

1 xyz . 1 .

would result in

1 xyz 1 1 .

Try it first and you will see that Reeza is correct.

Tom · Posted 06-13-2014 04:53 PM

How would you know the order to combine the tables?

If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement. The UPDATE statement works with only two datasets at a time. So try code like this:

data temp1;

input PR_KEY A B C ;

cards;

1 1 . .

2 . 1 .

3 . . .

run;

data temp2;

input PR_KEY A B C ;

cards;

1 . 2 .

2 . . .

3 . . 2

run;

data temp3;

input PR_KEY A B C ;

cards;

1 . . 3

2 . . .

3 . . .

run;

data want ;

update temp1 temp2 ;

by pr_key ;

run;

data _null_;

set want ;

put (pr_key a b c) (3.);

run;

data want ;

update want temp3 ;

by pr_key ;

run;

data _null_;

set want ;

put (pr_key a b c) (3.);

run;

Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated. Like this:

data want ;

set temp: ;

by pr_key ;

run;

data want ;

update want (obs=0) want ;

by pr_key ;

run;

data _null_;

set want ;

put (pr_key a b c) (3.);

run;

FloydNevseta · Posted 06-10-2014 01:57 AM

Please provide more clarity on #3. Does the order of the variables matter when determining the first non-missing value?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away