Solved: Re: Conditionally combine two tables

larsc · Posted 05-16-2021 07:43 AM

Dear SAS community,

I have two source tables which I'd like to combine into one dataset. I'd like to perform a check based on some conditions which'll decide whether I select the records from Table A or the records Table B. The conditions will be based on the maximum depth for a group across the two tables and the number of records within the group in events where the maximum depth is equal across the two tables. With everything being equal, I prefer to use the records from Table A.

For demonstration purposes, let's say I have these tables:

Table A:

Name Depth Coord_X Coord_Y

A 0 5 6

A 100 5 6

A 200 5 6

A 300 5 6

B 0 4 3

B 2000 4 3

B 3000 4 3

B 4000 4 3

C 0 2 1

C 3000 2 1

C 4000 2 1

C 5000 2 1

Table B

Name Depth Coord_X Coord_Y

A 0 5 6

A 250 5 6

A 300 5 6

B 0 4 3

B 2000 4 3

B 3000 4 3

B 4000 4 3

C 0 2 1

C 3000 2 1

C 4000 2 1

C 5000 2 1

C 5500 2 1

D 0 11 12

D 10 11 12

D 20 11 12

As can be seen, the tables contains the same attributes. However, Table A may contain some groups not in Table B, and vice versa. This can be seen in the example above where Table B contains a group D not in Table A.

First, I'd like to test on the maximum depth - I want to select the records from the input table which contains the greatest depth. Using the tables above as an example, for group C, I'd like my target table to contain the records from Table B as this has a greater maximum depth then Table A. The maximum depth is the key condition, and as long as a table contains a greater maximum depth then the other table, the number of records within each group is irrelevant, and I always select the records from the source containing the maximum depth. In other words, even if Table A contained 1000 records and Table B contained 4 records, as long as Table B contains a greater maximum depth I select the 4 records from Table B over the 1000 records from Table A.

Secondly, if the two tables contains an equal maximum depth for a group, I use the number of records within each group as a tie breaker. Using the above tables as an example, for group A, I want to select the records from Table A as this contains 4 records compared to the 3 records in Table B with the maximum depth being equal between the two tables.

In the event that both the maximum depth and the number of records are equal, I want to use Table A as my source.

Overall, based on these conditions, my target table should look like this. Notice that I want to add a column which tells which source has been selected for my Target table:

Target Table

Name Depth Coord_X Coord_Y Source

A 0 5 6 Table A

A 100 5 6 Table A

A 200 5 6 Table A

A 300 5 6 Table A

B 0 4 3 Table A

B 2000 4 3 Table A

B 3000 4 3 Table A

B 4000 4 3 Table A

C 0 2 1 Table B

C 3000 2 1 Table B

C 4000 2 1 Table B

C 5000 2 1 Table B

C 5500 2 1 Table B

D 0 11 12 Table B

D 10 11 12 Table B

D 20 11 12 Table B

I am fairly new to SAS and programming in general, and I am therefore unsure what the best strategy to solving this issue efficiently would be.

sbxkoenk · Posted 05-16-2021 09:53 AM

Hello,

I think this is what you are after.

Please check it with great diligence as I haven't done any quality control.

data Table_A;                                
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          100       5               6
A          200       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
;
run;

data Table_B;
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          250       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
C          5500      2               1
D          0         11             12
D          10        11             12
D          20        11             12
;
run;

PROC SQL noprint;
 create table Table_A_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_A
 group by Name;
 create table Table_B_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_B
 group by Name;
QUIT;

data Table_AB_MAX_Depth;
 set Table_A_MAX_Depth 
     Table_B_MAX_Depth indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),1,7);
run;

proc sort data=Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
run;

data Table_AB_MAX_Depth_Retain;
 set Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
 if first.name then output;
run;

PROC SQL noprint;
 create table FROM_Table_A as
 select *
 from Table_A
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_A');
 create table FROM_Table_B as
 select *
 from Table_B
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_B');
QUIT;

data Target_Table;
 set FROM_Table_A FROM_Table_B indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),6,12);
run;
/* end of program */

Cheers,

Koen

View solution in original post

sbxkoenk · Posted 05-16-2021 09:53 AM

Hello,

I think this is what you are after.

Please check it with great diligence as I haven't done any quality control.

data Table_A;                                
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          100       5               6
A          200       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
;
run;

data Table_B;
input Name $ Depth Coord_X Coord_Y;
cards;
A          0         5               6
A          250       5               6
A          300       5               6
B          0         4               3
B          2000      4               3
B          3000      4               3
B          4000      4               3
C          0         2               1
C          3000      2               1
C          4000      2               1
C          5000      2               1
C          5500      2               1
D          0         11             12
D          10        11             12
D          20        11             12
;
run;

PROC SQL noprint;
 create table Table_A_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_A
 group by Name;
 create table Table_B_MAX_Depth as
 select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
 from Table_B
 group by Name;
QUIT;

data Table_AB_MAX_Depth;
 set Table_A_MAX_Depth 
     Table_B_MAX_Depth indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),1,7);
run;

proc sort data=Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
run;

data Table_AB_MAX_Depth_Retain;
 set Table_AB_MAX_Depth;
 by Name descending MAX_Depth descending CountName;
 if first.name then output;
run;

PROC SQL noprint;
 create table FROM_Table_A as
 select *
 from Table_A
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_A');
 create table FROM_Table_B as
 select *
 from Table_B
 where name IN (select name
                from Table_AB_MAX_Depth_Retain
			    where sourceds='TABLE_B');
QUIT;

data Target_Table;
 set FROM_Table_A FROM_Table_B indsname=dsn;
 sourceds = substr(scan(dsn,2,"."),6,12);
run;
/* end of program */

Cheers,

Koen

sbxkoenk · Posted 05-16-2021 09:58 AM

You can add a PROC SORT to sort on Name.

larsc · Posted 05-17-2021 07:45 AM

Hey @sbxkoenk , from what I can tell your solution seems to work perfectly! Thank you for your help!

sbxkoenk · Posted 05-17-2021 11:30 AM

With pleasure!

Thank you for accepting my answer as a solution.

I have tried to produce code that is very 'readable' (since more compact code would also be possible) but if there's anything (any step) you do not fully understand, please ask! If everything is clear, all the better.

Cheers,

Koen

Conditionally combine two tables

Re: Conditionally combine two tables

Re: Conditionally combine two tables

Re: Conditionally combine two tables

Re: Conditionally combine two tables

Re: Conditionally combine two tables

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!