BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
buckeyefisher
Obsidian | Level 7

I have 2 files and want to compare them to create a third file. The third file should have a summary record of matches. 

Case number can repeat but subclasses in each file are unique per case number. 

 

File A 

Case_number                 subclass 

5-10-cv-00123                 434

5-10-cv-00123                 451 

5-10-cv-00123                 478

2-10-cv-00599                 510

2-10-cv-00599                 512 

 

File B

case_number                  subclass 

5-10-cv-00123                 221

5-10-cv-00123                 434

2-10-cv-00599                 510

2-10-cv-00599                 512

 

Output needed   

Case_number              total subclass from A             matched in B             Match Flag 

5-10-cv-00123                 3                                            1                                0  

2-10-cv-00599                 2                                             2                               1

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data work.seta;
 input Case_number  : $14.               subclass ;
datalines;
5-10-cv-00123                 434
5-10-cv-00123                 451 
5-10-cv-00123                 478
2-10-cv-00599                 510
2-10-cv-00599                 512 
;
 
data work.setb;
 input Case_number  : $14.               subclass ;
datalines;
5-10-cv-00123                 221
5-10-cv-00123                 434
2-10-cv-00599                 510
2-10-cv-00599                 512
;
proc sql;
create table temp_a as
 select  case_number,count(*) as n_from_a
  from seta 
   group by case_number;

create table temp as
select 	case_number,count(*) as match_in_b
from (
select *
 from seta
intersect
select *
 from setb
 )
 group by case_number;

create table want as
select a.*,match_in_b ,(n_from_a=match_in_b) as flag
 from temp_a as a left join temp as b
  on a.case_number=b.case_number;
quit;

View solution in original post

3 REPLIES 3
ballardw
Super User

@buckeyefisher wrote:

I have 2 files and want to compare them to create a third file. The third file should have a summary record of matches. 

Case number can repeat but subclasses in each file are unique per case number. 

 

File A 

Case_number                 subclass 

5-10-cv-00123                 434

5-10-cv-00123                 451 

5-10-cv-00123                 478

2-10-cv-00599                 510

2-10-cv-00599                 512 

 

File B

case_number                  subclass 

5-10-cv-00123                 221

5-10-cv-00123                 434

2-10-cv-00599                 510

2-10-cv-00599                 512

 

Output needed   

Case_number              total subclass from A             matched in B             Match Flag 

5-10-cv-00123                 3                                            1                                0  

2-10-cv-00599                 2                                             2                               1


One way:

data work.seta;
 input Case_number  :$14.               subclass ;
datalines;
5-10-cv-00123                 434
5-10-cv-00123                 451 
5-10-cv-00123                 478
2-10-cv-00599                 510
2-10-cv-00599                 512 
;
 
data work.setb;
 input Case_number  :$14.               subclass ;
datalines;
5-10-cv-00123                 221
5-10-cv-00123                 434
2-10-cv-00599                 510
2-10-cv-00599                 512
;

proc sort data=work.seta;
   by case_number subclass;
run;
proc sort data=work.setb;
   by case_number subclass;
run;

data work.need;
   merge work.seta (in=ina)
         work.setb (in=inb)
   ;
   matched = (ina and inb);
run;

proc summary data=work.need nway;
   class case_number;
   var matched;
   output out=work.summary (drop=_type_) sum=;
run;

data want;
   set work.summary;
   label 
      _freq_='Total subclass from A'
      matched='Matched in B'
      flag ='Match Flag'
   ;
   Flag = (_freq_= matched);
run;

Note the use of a DATASTEP to provide example data that can be used for testing. Also posting code in a code box opened with the {I} to maintain basic formatting and set aside as actual code.

 

If the ORDER of the records is important then additional stuff needs to be added.

This will likely not yield the desired result if there are subclass in set B not in A OR if subclasses repeat in either set. You would need to provide specific logic on handling that situation. The likely place would be in the data step for work.need with additional variable(s) for INB not INA.

A small key bit in this code is that SAS returns a numeric 1 for True and 0 for false. So you get 1 or 0 results in the

matched = (ina and inb);
and
Flag = (_freq_ = matched);

statements.

 

buckeyefisher
Obsidian | Level 7

Thanks, following code is giving incorrect results. Do need to say BY after merge?
data work.need;
merge work.seta (in=ina)
work.setb (in=inb)
;
matched = (ina and inb);
run;
Ksharp
Super User
data work.seta;
 input Case_number  : $14.               subclass ;
datalines;
5-10-cv-00123                 434
5-10-cv-00123                 451 
5-10-cv-00123                 478
2-10-cv-00599                 510
2-10-cv-00599                 512 
;
 
data work.setb;
 input Case_number  : $14.               subclass ;
datalines;
5-10-cv-00123                 221
5-10-cv-00123                 434
2-10-cv-00599                 510
2-10-cv-00599                 512
;
proc sql;
create table temp_a as
 select  case_number,count(*) as n_from_a
  from seta 
   group by case_number;

create table temp as
select 	case_number,count(*) as match_in_b
from (
select *
 from seta
intersect
select *
 from setb
 )
 group by case_number;

create table want as
select a.*,match_in_b ,(n_from_a=match_in_b) as flag
 from temp_a as a left join temp as b
  on a.case_number=b.case_number;
quit;

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 670 views
  • 1 like
  • 3 in conversation