Hello All,
I have some kind of weird requirement, but have to fulfill.
I want to merge all the tables of the library.
1. I don't know how many tables will be available.
2. Table structure for all the tables in library is same.
3. I want to use coalesce function to find out which is the non-missing variable and that value I have to pick up for a variable of output dataset.
4. Yes, key is same.
Can anyone please help at earliest?
Thanks,
Hatshit
How would you know the order to combine the tables?
If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement. The UPDATE statement works with only two datasets at a time. So try code like this:
data temp1;
input PR_KEY A B C ;
cards;
1 1 . .
2 . 1 .
3 . . .
run;
data temp2;
input PR_KEY A B C ;
cards;
1 . 2 .
2 . . .
3 . . 2
run;
data temp3;
input PR_KEY A B C ;
cards;
1 . . 3
2 . . .
3 . . .
run;
data want ;
update temp1 temp2 ;
by pr_key ;
run;
data _null_;
set want ;
put (pr_key a b c) (3.);
run;
data want ;
update want temp3 ;
by pr_key ;
run;
data _null_;
set want ;
put (pr_key a b c) (3.);
run;
Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated. Like this:
data want ;
set temp: ;
by pr_key ;
run;
data want ;
update want (obs=0) want ;
by pr_key ;
run;
data _null_;
set want ;
put (pr_key a b c) (3.);
run;
Table structure for all the tables is the same? Does this mean that every table has the exact same variables? If so you may be looking for an update process instead of a merge.
Can you provide more info on what you're trying to do.
First of all thanks guys for showing your interest. 
Yes, All tables has the same structure having same name for all variable.
Only difference is that if variable "A" in dataser temp1 has value then in all other datasets, it will be blank.
So when I use coalesce function, I get only value form the dataser where value is not missing for variable A in the final dataset.
For exx:
Table Temp_1
PR_KEY A B C
1 1 . .
2 . 1 .
3 . . .
Table Temp_2
PR_KEY A B C
1 . 1 .
2 1 . 1
3 1 1 1
So above is the structure of the table and how it looks like.
I have a key "PR_KEY" by which I want to merge all the tables of one library.
Use coalesce function to get the value for each variable in final single output dataset i.e. find non missing value for variable.
Here order will not matter because if value for variable "A" is non-missing in temp_1 dataset then, value for variable "A" will be blank in rest all datasets.
Take a look at the sashelp.vtable or dictionary.table. This will list all the tables in your library to help you get started.
I'd probably be lazy and append them all, especially if they had a naming convention and then take the max for each column.
Say my tables were named table1 table2 table3.
data want;
set table: ;*note the colon;
run;
proc sql;
create table want2 as
select pr_key, max(a) as a, max(b) as b, max(c) as c
from want
group by pr_key;
quit;
Your idea sounds very interesting, but i see one challenge here.
I can append all the table and then use max to find out the non-missing values for each columns, but how about the primary key column value.
The value in primary key can be there in all tables. so if say one observation with account number 123 is available in table A and it is possible that account number 123 is also available in table B.
So now when I use max for finding out the max value for each varible, I may get some unexpected data for each account number.
Am I understanding it correct?
No, because you'll find the max by primary key.
Try it.
I am not sure how it will work. :smileyconfused:
When we append all the tables in single table, the concept of primary is over.
Once you append all the records,you have multiple records for say same accounts.
it will look something like below:
1 xyz 1 . .
2 abc . 1 .
3 jkl . . 1
1 xyz . 1 .
2 abc 1 . .
3 jkl 1 . .
The group by clause will ensure that the key remains unique and the MAX function will assign the highest value in the variable for each key within the grouping.
So in your case
1 xyz 1 . .
1 xyz . 1 .
would result in
1 xyz 1 1 .
Try it first and you will see that Reeza is correct.
How would you know the order to combine the tables?
If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement. The UPDATE statement works with only two datasets at a time. So try code like this:
data temp1;
input PR_KEY A B C ;
cards;
1 1 . .
2 . 1 .
3 . . .
run;
data temp2;
input PR_KEY A B C ;
cards;
1 . 2 .
2 . . .
3 . . 2
run;
data temp3;
input PR_KEY A B C ;
cards;
1 . . 3
2 . . .
3 . . .
run;
data want ;
update temp1 temp2 ;
by pr_key ;
run;
data _null_;
set want ;
put (pr_key a b c) (3.);
run;
data want ;
update want temp3 ;
by pr_key ;
run;
data _null_;
set want ;
put (pr_key a b c) (3.);
run;
Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated. Like this:
data want ;
set temp: ;
by pr_key ;
run;
data want ;
update want (obs=0) want ;
by pr_key ;
run;
data _null_;
set want ;
put (pr_key a b c) (3.);
run;
Please provide more clarity on #3. Does the order of the variables matter when determining the first non-missing value?
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
