☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

## Compare three tables based on two variables or all variables

Hello all,

I have three tables, all with the same 11 variables (see below), that I need to compare.  I need to make a list of all observations that are identical across all three tables (Row 1 in all three tables for example), observations that have a unique combination of Person_ID and Incident_ID that are not found in the other two tables(Row 5 in table B for example), and observations that unique combination of all variables for each table (the fourth and fifth observation of table C for example).  Hopefully this makes sense.

Thanks,

Marc

Table A

 Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation 1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 3 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 4 2 2 24-Jan-2024 3-Mar-2024 12 3 4 0 0 0 1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1 3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1 5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 7 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1

Table B

 Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation 1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 3 1 1 17-Jan-2024 29-Jan-2024 13 3 4 0 0 0 1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1 2 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1 3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1 5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 0 0 2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 9 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 10 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1

Table C

 Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation 1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 3 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0 4 2 2 24-Jan-2024 3-Jun-2024 12 3 4 0 0 0 11 2 2 24-Jan-2024 3-Jun-2024 12 3 4 0 0 0 1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1 3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1 5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 7 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1 1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1 12 6 4 7-Mar-2024 13-Apr-2024 1 3 4 0 0 0

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

## Re: Compare three tables based on two variables or all variables

If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.

Untested, in the absence of a working sample data set.

``````proc sort data=a  out=a_sorted;
by _all_;
run;
proc sort data=b  out=b_sorted;
by _all_;
run;
proc sort data=c  out=c_sorted;
by _all_;
run;

data all_three a_only b_only c_only a_and_b  a_and_c  b_and_c ;
merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
by _all_;
select (4*ina + 2*inb + 1*inc);
when (7) output all_three;
when (6) output a_and_b;
when (5) output a_and_c;
when (4) output a_only;
when (3) output b_and_c;
when (2) output b_only;
when (1) output c_only;
end;
run;``````

But really, why do this at all?  Just make a single dataset with a permanent new variable  abc=4*ina+2*inb+1*inc.    Then you can use that variable as a filter for all subsequent processing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
4 REPLIES 4
Super User

## Re: Compare three tables based on two variables or all variables

If you want find the same obs from these three tables, try this :
proc sql;
create table want as
select * from A
intersect
select * from B
intersect
select * from C
;
quit;

P.S. Other questions you could get by my similar code.
PROC Star

## Re: Compare three tables based on two variables or all variables

If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.

Untested, in the absence of a working sample data set.

``````proc sort data=a  out=a_sorted;
by _all_;
run;
proc sort data=b  out=b_sorted;
by _all_;
run;
proc sort data=c  out=c_sorted;
by _all_;
run;

data all_three a_only b_only c_only a_and_b  a_and_c  b_and_c ;
merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
by _all_;
select (4*ina + 2*inb + 1*inc);
when (7) output all_three;
when (6) output a_and_b;
when (5) output a_and_c;
when (4) output a_only;
when (3) output b_and_c;
when (2) output b_only;
when (1) output c_only;
end;
run;``````

But really, why do this at all?  Just make a single dataset with a permanent new variable  abc=4*ina+2*inb+1*inc.    Then you can use that variable as a filter for all subsequent processing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

## Re: Compare three tables based on two variables or all variables

Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?
PROC Star

## Re: Compare three tables based on two variables or all variables

@ModeratelyWise wrote:
Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?

You could just use

``from_abc=4*ina + 2*inb + 1*inc;``

But it might be better just to keep three dummy variables as below.  That would be easier to remember.

``````data all_groupings;
merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
by _all_;
from_a=ina;
from_b=inb;
from_c=inc;
run;``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Discussion stats
• 4 replies
• 418 views
• 0 likes
• 3 in conversation