I have 2 files and want to compare them to create a third file. The third file should have a summary record of matches.
Case number can repeat but subclasses in each file are unique per case number.
File A
Case_number subclass
5-10-cv-00123 434
5-10-cv-00123 451
5-10-cv-00123 478
2-10-cv-00599 510
2-10-cv-00599 512
File B
case_number subclass
5-10-cv-00123 221
5-10-cv-00123 434
2-10-cv-00599 510
2-10-cv-00599 512
Output needed
Case_number total subclass from A matched in B Match Flag
5-10-cv-00123 3 1 0
2-10-cv-00599 2 2 1
data work.seta;
input Case_number : $14. subclass ;
datalines;
5-10-cv-00123 434
5-10-cv-00123 451
5-10-cv-00123 478
2-10-cv-00599 510
2-10-cv-00599 512
;
data work.setb;
input Case_number : $14. subclass ;
datalines;
5-10-cv-00123 221
5-10-cv-00123 434
2-10-cv-00599 510
2-10-cv-00599 512
;
proc sql;
create table temp_a as
select case_number,count(*) as n_from_a
from seta
group by case_number;
create table temp as
select case_number,count(*) as match_in_b
from (
select *
from seta
intersect
select *
from setb
)
group by case_number;
create table want as
select a.*,match_in_b ,(n_from_a=match_in_b) as flag
from temp_a as a left join temp as b
on a.case_number=b.case_number;
quit;
@buckeyefisher wrote:
I have 2 files and want to compare them to create a third file. The third file should have a summary record of matches.
Case number can repeat but subclasses in each file are unique per case number.
File A
Case_number subclass
5-10-cv-00123 434
5-10-cv-00123 451
5-10-cv-00123 478
2-10-cv-00599 510
2-10-cv-00599 512
File B
case_number subclass
5-10-cv-00123 221
5-10-cv-00123 434
2-10-cv-00599 510
2-10-cv-00599 512
Output needed
Case_number total subclass from A matched in B Match Flag
5-10-cv-00123 3 1 0
2-10-cv-00599 2 2 1
One way:
data work.seta; input Case_number :$14. subclass ; datalines; 5-10-cv-00123 434 5-10-cv-00123 451 5-10-cv-00123 478 2-10-cv-00599 510 2-10-cv-00599 512 ; data work.setb; input Case_number :$14. subclass ; datalines; 5-10-cv-00123 221 5-10-cv-00123 434 2-10-cv-00599 510 2-10-cv-00599 512 ; proc sort data=work.seta; by case_number subclass; run; proc sort data=work.setb; by case_number subclass; run; data work.need; merge work.seta (in=ina) work.setb (in=inb) ; matched = (ina and inb); run; proc summary data=work.need nway; class case_number; var matched; output out=work.summary (drop=_type_) sum=; run; data want; set work.summary; label _freq_='Total subclass from A' matched='Matched in B' flag ='Match Flag' ; Flag = (_freq_= matched); run;
Note the use of a DATASTEP to provide example data that can be used for testing. Also posting code in a code box opened with the {I} to maintain basic formatting and set aside as actual code.
If the ORDER of the records is important then additional stuff needs to be added.
This will likely not yield the desired result if there are subclass in set B not in A OR if subclasses repeat in either set. You would need to provide specific logic on handling that situation. The likely place would be in the data step for work.need with additional variable(s) for INB not INA.
A small key bit in this code is that SAS returns a numeric 1 for True and 0 for false. So you get 1 or 0 results in the
matched = (ina and inb);
and
Flag = (_freq_ = matched);
statements.
data work.seta;
input Case_number : $14. subclass ;
datalines;
5-10-cv-00123 434
5-10-cv-00123 451
5-10-cv-00123 478
2-10-cv-00599 510
2-10-cv-00599 512
;
data work.setb;
input Case_number : $14. subclass ;
datalines;
5-10-cv-00123 221
5-10-cv-00123 434
2-10-cv-00599 510
2-10-cv-00599 512
;
proc sql;
create table temp_a as
select case_number,count(*) as n_from_a
from seta
group by case_number;
create table temp as
select case_number,count(*) as match_in_b
from (
select *
from seta
intersect
select *
from setb
)
group by case_number;
create table want as
select a.*,match_in_b ,(n_from_a=match_in_b) as flag
from temp_a as a left join temp as b
on a.case_number=b.case_number;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.