Hello all,
I have three tables, all with the same 11 variables (see below), that I need to compare. I need to make a list of all observations that are identical across all three tables (Row 1 in all three tables for example), observations that have a unique combination of Person_ID and Incident_ID that are not found in the other two tables(Row 5 in table B for example), and observations that unique combination of all variables for each table (the fourth and fifth observation of table C for example). Hopefully this makes sense.
Thanks,
Marc
Table A
Person_ID | Incident_ID | Family_ID | Incident_Date | Invest_Comp_date | Incident_Type | Incident_Type_Collapsed | Invest_Finding | Finding_Collapsed | Internal_Incident | Internal_Investigation |
1 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
2 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
3 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
4 | 2 | 2 | 24-Jan-2024 | 3-Mar-2024 | 12 | 3 | 4 | 0 | 0 | 0 |
1 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
3 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
5 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
6 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
7 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
8 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
1 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
2 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
3 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
Table B
Person_ID | Incident_ID | Family_ID | Incident_Date | Invest_Comp_date | Incident_Type | Incident_Type_Collapsed | Invest_Finding | Finding_Collapsed | Internal_Incident | Internal_Investigation |
1 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
2 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
3 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 13 | 3 | 4 | 0 | 0 | 0 |
1 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
2 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
3 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
5 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
6 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
8 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
1 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 0 | 0 |
2 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
3 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
9 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
10 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
Table C
Person_ID | Incident_ID | Family_ID | Incident_Date | Invest_Comp_date | Incident_Type | Incident_Type_Collapsed | Invest_Finding | Finding_Collapsed | Internal_Incident | Internal_Investigation |
1 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
2 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
3 | 1 | 1 | 17-Jan-2024 | 29-Jan-2024 | 1 | 3 | 5 | 2 | 0 | 0 |
4 | 2 | 2 | 24-Jan-2024 | 3-Jun-2024 | 12 | 3 | 4 | 0 | 0 | 0 |
11 | 2 | 2 | 24-Jan-2024 | 3-Jun-2024 | 12 | 3 | 4 | 0 | 0 | 0 |
1 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
3 | 3 | 1 | 4-Feb-2024 | 16-Feb-2024 | 1 | 3 | 10 | 0 | 1 | 1 |
5 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
6 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
7 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
8 | 4 | 3 | 15-Feb-2024 | 8-Apr-2024 | 13 | 3 | 7 | 1 | 1 | 1 |
1 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
2 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
3 | 5 | 1 | 6-Mar-2024 | 12-Apr-2024 | 3 | 3 | 5 | 2 | 1 | 1 |
12 | 6 | 4 | 7-Mar-2024 | 13-Apr-2024 | 1 | 3 | 4 | 0 | 0 | 0 |
If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.
Untested, in the absence of a working sample data set.
proc sort data=a out=a_sorted;
by _all_;
run;
proc sort data=b out=b_sorted;
by _all_;
run;
proc sort data=c out=c_sorted;
by _all_;
run;
data all_three a_only b_only c_only a_and_b a_and_c b_and_c ;
merge a_sorted (in=ina) b_sorted (in=inb) c_sorted (in=inc);
by _all_;
select (4*ina + 2*inb + 1*inc);
when (7) output all_three;
when (6) output a_and_b;
when (5) output a_and_c;
when (4) output a_only;
when (3) output b_and_c;
when (2) output b_only;
when (1) output c_only;
end;
run;
But really, why do this at all? Just make a single dataset with a permanent new variable abc=4*ina+2*inb+1*inc. Then you can use that variable as a filter for all subsequent processing.
If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.
Untested, in the absence of a working sample data set.
proc sort data=a out=a_sorted;
by _all_;
run;
proc sort data=b out=b_sorted;
by _all_;
run;
proc sort data=c out=c_sorted;
by _all_;
run;
data all_three a_only b_only c_only a_and_b a_and_c b_and_c ;
merge a_sorted (in=ina) b_sorted (in=inb) c_sorted (in=inc);
by _all_;
select (4*ina + 2*inb + 1*inc);
when (7) output all_three;
when (6) output a_and_b;
when (5) output a_and_c;
when (4) output a_only;
when (3) output b_and_c;
when (2) output b_only;
when (1) output c_only;
end;
run;
But really, why do this at all? Just make a single dataset with a permanent new variable abc=4*ina+2*inb+1*inc. Then you can use that variable as a filter for all subsequent processing.
@ModeratelyWise wrote:
Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?
You could just use
from_abc=4*ina + 2*inb + 1*inc;
But it might be better just to keep three dummy variables as below. That would be easier to remember.
data all_groupings;
merge a_sorted (in=ina) b_sorted (in=inb) c_sorted (in=inc);
by _all_;
from_a=ina;
from_b=inb;
from_c=inc;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.