Hello all,
I have three tables, all with the same 11 variables (see below), that I need to compare. I need to make a list of all observations that are identical across all three tables (Row 1 in all three tables for example), observations that have a unique combination of Person_ID and Incident_ID that are not found in the other two tables(Row 5 in table B for example), and observations that unique combination of all variables for each table (the fourth and fifth observation of table C for example). Hopefully this makes sense.
Thanks,
Marc
Table A
Person_ID | Incident_ID | Family_ID | Incident_Date | Invest_Comp_date | Incident_Type | Incident_Type_Collapsed | Invest_Finding | Finding_Collapsed | Internal_Incident | Internal_Investigation |
1 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
2 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
3 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
4 | 2 | 2 | 24-Jan-2024 | 3-Mar-2024 | 12 | 3 | 4 | 0 | 0 | 0 |
1 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
3 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
5 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
6 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
7 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
8 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
1 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
2 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
3 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
Table B
Person_ID | Incident_ID | Family_ID | Incident_Date | Invest_Comp_date | Incident_Type | Incident_Type_Collapsed | Invest_Finding | Finding_Collapsed | Internal_Incident | Internal_Investigation |
1 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
2 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
3 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 13 | 3 | 4 | 0 | 0 | 0 |
1 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
2 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
3 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
5 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
6 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
8 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
1 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 0 | 0 |
2 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
3 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
9 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
10 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
Table C
Person_ID | Incident_ID | Family_ID | Incident_Date | Invest_Comp_date | Incident_Type | Incident_Type_Collapsed | Invest_Finding | Finding_Collapsed | Internal_Incident | Internal_Investigation |
1 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
2 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
3 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
4 | 2 | 2 | 24-Jan-2024 | 3-Jun-2024 | 12 | 3 | 4 | 0 | 0 | 0 |
11 | 2 | 2 | 24-Jan-2024 | 3-Jun-2024 | 12 | 3 | 4 | 0 | 0 | 0 |
1 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
3 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
5 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
6 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
7 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
8 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
1 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
2 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
3 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
12 | 6 | 4 | 7-Mar-2024 | 13-Apr-2024 | 1 | 3 | 4 | 0 | 0 | 0 |
If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.
Untested, in the absence of a working sample data set.
proc sort data=a out=a_sorted;
by _all_;
run;
proc sort data=b out=b_sorted;
by _all_;
run;
proc sort data=c out=c_sorted;
by _all_;
run;
data all_three a_only b_only c_only a_and_b a_and_c b_and_c ;
merge a_sorted (in=ina) b_sorted (in=inb) c_sorted (in=inc);
by _all_;
select (4*ina + 2*inb + 1*inc);
when (7) output all_three;
when (6) output a_and_b;
when (5) output a_and_c;
when (4) output a_only;
when (3) output b_and_c;
when (2) output b_only;
when (1) output c_only;
end;
run;
But really, why do this at all? Just make a single dataset with a permanent new variable abc=4*ina+2*inb+1*inc. Then you can use that variable as a filter for all subsequent processing.
If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.
Untested, in the absence of a working sample data set.
proc sort data=a out=a_sorted;
by _all_;
run;
proc sort data=b out=b_sorted;
by _all_;
run;
proc sort data=c out=c_sorted;
by _all_;
run;
data all_three a_only b_only c_only a_and_b a_and_c b_and_c ;
merge a_sorted (in=ina) b_sorted (in=inb) c_sorted (in=inc);
by _all_;
select (4*ina + 2*inb + 1*inc);
when (7) output all_three;
when (6) output a_and_b;
when (5) output a_and_c;
when (4) output a_only;
when (3) output b_and_c;
when (2) output b_only;
when (1) output c_only;
end;
run;
But really, why do this at all? Just make a single dataset with a permanent new variable abc=4*ina+2*inb+1*inc. Then you can use that variable as a filter for all subsequent processing.
@ModeratelyWise wrote:
Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?
You could just use
from_abc=4*ina + 2*inb + 1*inc;
But it might be better just to keep three dummy variables as below. That would be easier to remember.
data all_groupings;
merge a_sorted (in=ina) b_sorted (in=inb) c_sorted (in=inc);
by _all_;
from_a=ina;
from_b=inb;
from_c=inc;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.