Dear SAS community,
I have two source tables which I'd like to combine into one dataset. I'd like to perform a check based on some conditions which'll decide whether I select the records from Table A or the records Table B. The conditions will be based on the maximum depth for a group across the two tables and the number of records within the group in events where the maximum depth is equal across the two tables. With everything being equal, I prefer to use the records from Table A.
For demonstration purposes, let's say I have these tables:
Table A:
Name Depth Coord_X Coord_Y
A 0 5 6
A 100 5 6
A 200 5 6
A 300 5 6
B 0 4 3
B 2000 4 3
B 3000 4 3
B 4000 4 3
C 0 2 1
C 3000 2 1
C 4000 2 1
C 5000 2 1
Table B
Name Depth Coord_X Coord_Y
A 0 5 6
A 250 5 6
A 300 5 6
B 0 4 3
B 2000 4 3
B 3000 4 3
B 4000 4 3
C 0 2 1
C 3000 2 1
C 4000 2 1
C 5000 2 1
C 5500 2 1
D 0 11 12
D 10 11 12
D 20 11 12
As can be seen, the tables contains the same attributes. However, Table A may contain some groups not in Table B, and vice versa. This can be seen in the example above where Table B contains a group D not in Table A.
First, I'd like to test on the maximum depth - I want to select the records from the input table which contains the greatest depth. Using the tables above as an example, for group C, I'd like my target table to contain the records from Table B as this has a greater maximum depth then Table A. The maximum depth is the key condition, and as long as a table contains a greater maximum depth then the other table, the number of records within each group is irrelevant, and I always select the records from the source containing the maximum depth. In other words, even if Table A contained 1000 records and Table B contained 4 records, as long as Table B contains a greater maximum depth I select the 4 records from Table B over the 1000 records from Table A.
Secondly, if the two tables contains an equal maximum depth for a group, I use the number of records within each group as a tie breaker. Using the above tables as an example, for group A, I want to select the records from Table A as this contains 4 records compared to the 3 records in Table B with the maximum depth being equal between the two tables.
In the event that both the maximum depth and the number of records are equal, I want to use Table A as my source.
Overall, based on these conditions, my target table should look like this. Notice that I want to add a column which tells which source has been selected for my Target table:
Target Table
Name Depth Coord_X Coord_Y Source
A 0 5 6 Table A
A 100 5 6 Table A
A 200 5 6 Table A
A 300 5 6 Table A
B 0 4 3 Table A
B 2000 4 3 Table A
B 3000 4 3 Table A
B 4000 4 3 Table A
C 0 2 1 Table B
C 3000 2 1 Table B
C 4000 2 1 Table B
C 5000 2 1 Table B
C 5500 2 1 Table B
D 0 11 12 Table B
D 10 11 12 Table B
D 20 11 12 Table B
I am fairly new to SAS and programming in general, and I am therefore unsure what the best strategy to solving this issue efficiently would be.
Hello,
I think this is what you are after.
Please check it with great diligence as I haven't done any quality control.
data Table_A;
input Name $ Depth Coord_X Coord_Y;
cards;
A 0 5 6
A 100 5 6
A 200 5 6
A 300 5 6
B 0 4 3
B 2000 4 3
B 3000 4 3
B 4000 4 3
C 0 2 1
C 3000 2 1
C 4000 2 1
C 5000 2 1
;
run;
data Table_B;
input Name $ Depth Coord_X Coord_Y;
cards;
A 0 5 6
A 250 5 6
A 300 5 6
B 0 4 3
B 2000 4 3
B 3000 4 3
B 4000 4 3
C 0 2 1
C 3000 2 1
C 4000 2 1
C 5000 2 1
C 5500 2 1
D 0 11 12
D 10 11 12
D 20 11 12
;
run;
PROC SQL noprint;
create table Table_A_MAX_Depth as
select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
from Table_A
group by Name;
create table Table_B_MAX_Depth as
select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
from Table_B
group by Name;
QUIT;
data Table_AB_MAX_Depth;
set Table_A_MAX_Depth
Table_B_MAX_Depth indsname=dsn;
sourceds = substr(scan(dsn,2,"."),1,7);
run;
proc sort data=Table_AB_MAX_Depth;
by Name descending MAX_Depth descending CountName;
run;
data Table_AB_MAX_Depth_Retain;
set Table_AB_MAX_Depth;
by Name descending MAX_Depth descending CountName;
if first.name then output;
run;
PROC SQL noprint;
create table FROM_Table_A as
select *
from Table_A
where name IN (select name
from Table_AB_MAX_Depth_Retain
where sourceds='TABLE_A');
create table FROM_Table_B as
select *
from Table_B
where name IN (select name
from Table_AB_MAX_Depth_Retain
where sourceds='TABLE_B');
QUIT;
data Target_Table;
set FROM_Table_A FROM_Table_B indsname=dsn;
sourceds = substr(scan(dsn,2,"."),6,12);
run;
/* end of program */
Cheers,
Koen
Hello,
I think this is what you are after.
Please check it with great diligence as I haven't done any quality control.
data Table_A;
input Name $ Depth Coord_X Coord_Y;
cards;
A 0 5 6
A 100 5 6
A 200 5 6
A 300 5 6
B 0 4 3
B 2000 4 3
B 3000 4 3
B 4000 4 3
C 0 2 1
C 3000 2 1
C 4000 2 1
C 5000 2 1
;
run;
data Table_B;
input Name $ Depth Coord_X Coord_Y;
cards;
A 0 5 6
A 250 5 6
A 300 5 6
B 0 4 3
B 2000 4 3
B 3000 4 3
B 4000 4 3
C 0 2 1
C 3000 2 1
C 4000 2 1
C 5000 2 1
C 5500 2 1
D 0 11 12
D 10 11 12
D 20 11 12
;
run;
PROC SQL noprint;
create table Table_A_MAX_Depth as
select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
from Table_A
group by Name;
create table Table_B_MAX_Depth as
select Name , MAX(Depth) as MAX_Depth , count(*) as CountName
from Table_B
group by Name;
QUIT;
data Table_AB_MAX_Depth;
set Table_A_MAX_Depth
Table_B_MAX_Depth indsname=dsn;
sourceds = substr(scan(dsn,2,"."),1,7);
run;
proc sort data=Table_AB_MAX_Depth;
by Name descending MAX_Depth descending CountName;
run;
data Table_AB_MAX_Depth_Retain;
set Table_AB_MAX_Depth;
by Name descending MAX_Depth descending CountName;
if first.name then output;
run;
PROC SQL noprint;
create table FROM_Table_A as
select *
from Table_A
where name IN (select name
from Table_AB_MAX_Depth_Retain
where sourceds='TABLE_A');
create table FROM_Table_B as
select *
from Table_B
where name IN (select name
from Table_AB_MAX_Depth_Retain
where sourceds='TABLE_B');
QUIT;
data Target_Table;
set FROM_Table_A FROM_Table_B indsname=dsn;
sourceds = substr(scan(dsn,2,"."),6,12);
run;
/* end of program */
Cheers,
Koen
You can add a PROC SORT to sort on Name.
Hey @sbxkoenk , from what I can tell your solution seems to work perfectly! Thank you for your help!
With pleasure!
Thank you for accepting my answer as a solution.
I have tried to produce code that is very 'readable' (since more compact code would also be possible) but if there's anything (any step) you do not fully understand, please ask! If everything is clear, all the better.
Cheers,
Koen
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.