BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ModeratelyWise
Obsidian | Level 7

Hello all,

I have three tables, all with the same 11 variables (see below), that I need to compare.  I need to make a list of all observations that are identical across all three tables (Row 1 in all three tables for example), observations that have a unique combination of Person_ID and Incident_ID that are not found in the other two tables(Row 5 in table B for example), and observations that unique combination of all variables for each table (the fourth and fifth observation of table C for example).  Hopefully this makes sense.

Thanks,

Marc

 

 

Table A

Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation
1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
3 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
4 2 2 24-Jan-2024 3-Mar-2024 12 3 4 0 0 0
1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
7 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1

 

Table B

Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation
1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
3 1 1 17-Jan-2024 29-Jan-2024 13 3 4 0 0 0
1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
2 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 0 0
2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
9 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
10 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1

 

Table C

Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation
1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
3 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
4 2 2 24-Jan-2024 3-Jun-2024 12 3 4 0 0 0
11 2 2 24-Jan-2024 3-Jun-2024 12 3 4 0 0 0
1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
7 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
12 6 4 7-Mar-2024 13-Apr-2024 1 3 4 0 0 0

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.

 

Untested, in the absence of a working sample data set.

 

proc sort data=a  out=a_sorted;
  by _all_;
run;
proc sort data=b  out=b_sorted;
  by _all_;
run;
proc sort data=c  out=c_sorted;
  by _all_;
run;

data all_three a_only b_only c_only a_and_b  a_and_c  b_and_c ;
  merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
  by _all_;
  select (4*ina + 2*inb + 1*inc);
    when (7) output all_three;
    when (6) output a_and_b;
    when (5) output a_and_c;
    when (4) output a_only;
    when (3) output b_and_c;
    when (2) output b_only;
    when (1) output c_only;
  end;
run;

But really, why do this at all?  Just make a single dataset with a permanent new variable  abc=4*ina+2*inb+1*inc.    Then you can use that variable as a filter for all subsequent processing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Ksharp
Super User
If you want find the same obs from these three tables, try this :
proc sql;
create table want as
select * from A
intersect
select * from B
intersect
select * from C
;
quit;


P.S. Other questions you could get by my similar code.
mkeintz
PROC Star

If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.

 

Untested, in the absence of a working sample data set.

 

proc sort data=a  out=a_sorted;
  by _all_;
run;
proc sort data=b  out=b_sorted;
  by _all_;
run;
proc sort data=c  out=c_sorted;
  by _all_;
run;

data all_three a_only b_only c_only a_and_b  a_and_c  b_and_c ;
  merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
  by _all_;
  select (4*ina + 2*inb + 1*inc);
    when (7) output all_three;
    when (6) output a_and_b;
    when (5) output a_and_c;
    when (4) output a_only;
    when (3) output b_and_c;
    when (2) output b_only;
    when (1) output c_only;
  end;
run;

But really, why do this at all?  Just make a single dataset with a permanent new variable  abc=4*ina+2*inb+1*inc.    Then you can use that variable as a filter for all subsequent processing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ModeratelyWise
Obsidian | Level 7
Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?
mkeintz
PROC Star

@ModeratelyWise wrote:
Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?

 

You could just use

from_abc=4*ina + 2*inb + 1*inc;

But it might be better just to keep three dummy variables as below.  That would be easier to remember.


 

data all_groupings;
  merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
  by _all_;
  from_a=ina;
  from_b=inb;
  from_c=inc;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1475 views
  • 0 likes
  • 3 in conversation