BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
skygold16
Fluorite | Level 6

Hello All,

I have some kind of weird requirement, but have to fulfill.

I want to merge all the tables of the library.

1. I don't know how many tables will be available.

2. Table structure for all the tables in library is same.

3. I want to use coalesce function to find out which is the non-missing variable and that value I have to pick up for a variable of output dataset.

4. Yes, key is same.

Can anyone please help at earliest?

Thanks,

Hatshit

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

How would you know the order to combine the tables?

If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement.  The UPDATE statement works with only two datasets at a time.  So try code like this:

data temp1;

input PR_KEY A B C ;

cards;

1 1 . .

2 . 1 .

3 . . .

run;

data temp2;

input PR_KEY A B C ;

cards;

1 . 2 .

2 . . .

3 . . 2

run;

data temp3;

input PR_KEY A B C ;

cards;

1 . . 3

2 . . .

3 . . .

run;

data want ;

  update temp1 temp2 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

data want ;

  update want temp3 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;


Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated.  Like this:

data want ;

  set temp: ;

  by pr_key ;

run;

data want ;

  update want (obs=0) want ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

View solution in original post

9 REPLIES 9
Reeza
Super User

Table structure for all the tables is the same? Does this mean that every table has the exact same variables? If so you may be looking for an update process instead of a merge.


Can you provide more info on what you're trying to do.

skygold16
Fluorite | Level 6

First of all thanks guys for showing your interest. Smiley Happy

Yes, All tables has the same structure having same name for all variable.

Only difference is that if variable "A" in dataser temp1 has value then in all other datasets, it will be blank.

So when I use coalesce function, I get only value form the dataser where value is not missing for variable A in the final dataset.

For exx:

Table Temp_1

PR_KEY     A     B     C

1               1       .       .

2               .        1      .

3               .        .       .

Table Temp_2

PR_KEY     A     B     C

1                 .     1     .  

2                 1     .     1  

3                  1     1     1

So above is the structure  of the table and how it looks like.

I have a key "PR_KEY" by which I want to merge all the tables of one library.

Use coalesce function to get the value for each variable in final single output dataset i.e. find non missing value for variable.

Here order will not matter because if value for variable "A" is non-missing in temp_1 dataset then, value for variable "A" will be blank in rest all datasets.

Reeza
Super User

Take a look at the sashelp.vtable or dictionary.table. This will list all the tables in your library to help you get started.

I'd probably be lazy and append them all, especially if they had a naming convention and then take the max for each column.

Say my tables were named table1 table2 table3.

data want;

set table: ;*note the colon;

run;

proc sql;

create table want2 as

select pr_key, max(a) as a, max(b) as b, max(c) as c

from want

group by pr_key;

quit;

skygold16
Fluorite | Level 6

Your idea sounds very interesting, but i see one challenge here.

I can append all the table and then use max to find out the non-missing values for each columns, but how about the primary key column value.

The value in primary key can be there in all tables. so if say one observation with account number 123 is available in table A and it is possible that account number 123 is also available in table B.

So now when I use max for finding out the max value for each varible, I may get some unexpected data for each account number.


Am I understanding it correct?

Reeza
Super User

No, because you'll find the max by primary key.

Try it.

skygold16
Fluorite | Level 6

I am not sure how it will work. :smileyconfused:

When we append all the tables in single table, the concept of primary is over.

Once you append all the records,you have multiple records for say same accounts.

it will look something like below:

1     xyz      1     .     .    

2     abc     .     1     .    

3     jkl        .     .       1

1     xyz     .     1     .

2     abc     1     .     .    

3     jkl       1     .     .

Scott_Mitchell
Quartz | Level 8

The group by clause will ensure that the key remains unique and the MAX function will assign the highest value in the variable for each key within the grouping.

So in your case

1     xyz      1     .     .   

1     xyz     .      1     .


would result in


1     xyz     1      1     .


Try it first and you will see that Reeza is correct.

Tom
Super User Tom
Super User

How would you know the order to combine the tables?

If your PR_KEY uniquely defines the output rows then it looks like you do want the UPDATE statement.  The UPDATE statement works with only two datasets at a time.  So try code like this:

data temp1;

input PR_KEY A B C ;

cards;

1 1 . .

2 . 1 .

3 . . .

run;

data temp2;

input PR_KEY A B C ;

cards;

1 . 2 .

2 . . .

3 . . 2

run;

data temp3;

input PR_KEY A B C ;

cards;

1 . . 3

2 . . .

3 . . .

run;

data want ;

  update temp1 temp2 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

data want ;

  update want temp3 ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;


Now if your tables are nicely named like in your example then perhaps you can fool UPDATE by creating one large file made up of all of the individual files concatenated.  Like this:

data want ;

  set temp: ;

  by pr_key ;

run;

data want ;

  update want (obs=0) want ;

  by pr_key ;

run;

data _null_;

  set want ;

  put (pr_key a b c) (3.);

run;

FloydNevseta
Pyrite | Level 9

Please provide more clarity on #3. Does the order of the variables matter when determining the first non-missing value?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2036 views
  • 7 likes
  • 5 in conversation