BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ModeratelyWise
Obsidian | Level 7

Hello all,

I have three tables, all with the same 11 variables (see below), that I need to compare.  I need to make a list of all observations that are identical across all three tables (Row 1 in all three tables for example), observations that have a unique combination of Person_ID and Incident_ID that are not found in the other two tables(Row 5 in table B for example), and observations that unique combination of all variables for each table (the fourth and fifth observation of table C for example).  Hopefully this makes sense.

Thanks,

Marc

 

 

Table A

Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation
1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
3 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
4 2 2 24-Jan-2024 3-Mar-2024 12 3 4 0 0 0
1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
7 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1

 

Table B

Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation
1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
3 1 1 17-Jan-2024 29-Jan-2024 13 3 4 0 0 0
1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
2 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 0 0
2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
9 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
10 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1

 

Table C

Person_ID Incident_ID Family_ID Incident_Date Invest_Comp_date Incident_Type Incident_Type_Collapsed Invest_Finding Finding_Collapsed Internal_Incident Internal_Investigation
1 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
2 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
3 1 1 17-Jan-2024 29-Jan-2024 1 3 5 2 0 0
4 2 2 24-Jan-2024 3-Jun-2024 12 3 4 0 0 0
11 2 2 24-Jan-2024 3-Jun-2024 12 3 4 0 0 0
1 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
3 3 1 4-Feb-2024 16-Feb-2024 1 3 10 0 1 1
5 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
6 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
7 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
8 4 3 15-Feb-2024 8-Apr-2024 13 3 7 1 1 1
1 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
2 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
3 5 1 6-Mar-2024 12-Apr-2024 3 3 5 2 1 1
12 6 4 7-Mar-2024 13-Apr-2024 1 3 4 0 0 0

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.

 

Untested, in the absence of a working sample data set.

 

proc sort data=a  out=a_sorted;
  by _all_;
run;
proc sort data=b  out=b_sorted;
  by _all_;
run;
proc sort data=c  out=c_sorted;
  by _all_;
run;

data all_three a_only b_only c_only a_and_b  a_and_c  b_and_c ;
  merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
  by _all_;
  select (4*ina + 2*inb + 1*inc);
    when (7) output all_three;
    when (6) output a_and_b;
    when (5) output a_and_c;
    when (4) output a_only;
    when (3) output b_and_c;
    when (2) output b_only;
    when (1) output c_only;
  end;
run;

But really, why do this at all?  Just make a single dataset with a permanent new variable  abc=4*ina+2*inb+1*inc.    Then you can use that variable as a filter for all subsequent processing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Ksharp
Super User
If you want find the same obs from these three tables, try this :
proc sql;
create table want as
select * from A
intersect
select * from B
intersect
select * from C
;
quit;


P.S. Other questions you could get by my similar code.
mkeintz
PROC Star

If sorting your datasets is not too expensive, then after sorting each dataset, a single data step can make a dataset for each combination of intersections.

 

Untested, in the absence of a working sample data set.

 

proc sort data=a  out=a_sorted;
  by _all_;
run;
proc sort data=b  out=b_sorted;
  by _all_;
run;
proc sort data=c  out=c_sorted;
  by _all_;
run;

data all_three a_only b_only c_only a_and_b  a_and_c  b_and_c ;
  merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
  by _all_;
  select (4*ina + 2*inb + 1*inc);
    when (7) output all_three;
    when (6) output a_and_b;
    when (5) output a_and_c;
    when (4) output a_only;
    when (3) output b_and_c;
    when (2) output b_only;
    when (1) output c_only;
  end;
run;

But really, why do this at all?  Just make a single dataset with a permanent new variable  abc=4*ina+2*inb+1*inc.    Then you can use that variable as a filter for all subsequent processing.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ModeratelyWise
Obsidian | Level 7
Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?
mkeintz
PROC Star

@ModeratelyWise wrote:
Thanks.
Forgive my ignorance, but how do create that variable in the combined dataset?

 

You could just use

from_abc=4*ina + 2*inb + 1*inc;

But it might be better just to keep three dummy variables as below.  That would be easier to remember.


 

data all_groupings;
  merge a_sorted (in=ina)  b_sorted (in=inb)  c_sorted (in=inc);
  by _all_;
  from_a=ina;
  from_b=inb;
  from_c=inc;
run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 419 views
  • 0 likes
  • 3 in conversation