Hi,
I have two datasets in long format that have variables “ID” and “incident_number” in common. Each person has a unique ID number but two or more people can have the same incident_number if they violate school policy together (for example, both of them engaged in underage drinking at the same party; so the underage drinking at the party would count as one incident).
If DatasetB doesn’t have a matching ID and incident number with DatasetA, I want to delete unmatched cases in DatasetB. So, I want to select cases based on a combination of ID and incident number.
Is there a way to modify DatasetB in this way?
Have DatasetA:
ID  | Incident_number  | VarA  | VarB  | 
1  | 56789  | Abcd  | 1232  | 
1  | 56789  | Erfsdjdgr  | 94847  | 
2  | 64908  | Fakjsdf;asie  | 5342  | 
2  | 56789  | Sajfldkjsa;ljf  | 54698734  | 
3  | 78621  | Kjdafjdh  | 0589576  | 
4  | 78621  | afsdfasdkljfa  | 1154  | 
4  | 78721  | Fsafd  | 987888  | 
Have DatasetB:
ID  | Incident_number  | VarX  | VarY  | 
1  | 56789  | 56789  | galdjfoia  | 
1  | 56789  | 56789  | sadjfakhd  | 
1  | 56818  | 56818  | weoiatlk  | 
2  | 56789  | 124363  | Abjdkafh  | 
2  | 64826  | 5463  | Giagaojdfl  | 
3  | 78629  | 1545744654  | Ajfu  | 
3  | 78799  | 1265841  | Fkajhdfajh  | 
4  | 78621  | 9378464  | Nadfaei  | 
4  | 79725  | 5486513541  | Lirueoi  | 
9  | 79102  | 23756  | Nann  | 
12  | 80215  | 987543  | Ueyrf  | 
Want new_DatasetB
ID  | Incident  | VarX  | VarY  | 
1  | 56789  | 56789  | galdjfoia  | 
1  | 56789  | 56789  | sadjfakhd  | 
2  | 56789  | 124363  | Abjdkafh  | 
4  | 78621  | 9378464  | Nadfaei  | 
Thank you for your help.
Hash Table ?
data DatasetA;
input 
ID
Incident_number
VarA $
VarB ;
cards4;
1
56789
Abcd
1232
1
56789
Erfsdjdgr
94847
2
64908
Fakjsdf;asie
5342
2
56789
Sajfldkjsa;ljf
54698734
3
78621
Kjdafjdh
0589576
4
78621
afsdfasdkljfa
1154
4
78721
Fsafd
987888
;;;;
run;
data DatasetB;
input 
ID
Incident_number
VarX 
VarY $;
cards4;
1
56789
56789
galdjfoia
1
56789
56789
sadjfakhd
1
56818
56818
weoiatlk
2
56789
124363
Abjdkafh
2
64826
5463
Giagaojdfl
3
78629
1545744654
Ajfu
3
78799
1265841
Fkajhdfajh
4
78621
9378464
Nadfaei
4
79725
5486513541
Lirueoi
9
79102
23756
Nann
12
80215
987543
Ueyrf
;;;;
run;
data want;
 if _n_ eq 1 then do;
  declare hash h(dataset:'DatasetA');
  h.definekey('ID','Incident_number');
  h.definedone();
 end;
set DatasetB;
if h.check()=0;
run;
Xia Keshan
Try this,
proc sort data=a; by ID Incident_number; run;
proc sort data=b; by ID Incident_number; run;
data want; merge a(in=a) b(in=b); by ID Incident_number; if a=b; drop VarA VarB; run;
proc print noobs; run;
Thank you K C Wong. Is there any way of achieving the same result without merging the datasets?
Hash Table ?
data DatasetA;
input 
ID
Incident_number
VarA $
VarB ;
cards4;
1
56789
Abcd
1232
1
56789
Erfsdjdgr
94847
2
64908
Fakjsdf;asie
5342
2
56789
Sajfldkjsa;ljf
54698734
3
78621
Kjdafjdh
0589576
4
78621
afsdfasdkljfa
1154
4
78721
Fsafd
987888
;;;;
run;
data DatasetB;
input 
ID
Incident_number
VarX 
VarY $;
cards4;
1
56789
56789
galdjfoia
1
56789
56789
sadjfakhd
1
56818
56818
weoiatlk
2
56789
124363
Abjdkafh
2
64826
5463
Giagaojdfl
3
78629
1545744654
Ajfu
3
78799
1265841
Fkajhdfajh
4
78621
9378464
Nadfaei
4
79725
5486513541
Lirueoi
9
79102
23756
Nann
12
80215
987543
Ueyrf
;;;;
run;
data want;
 if _n_ eq 1 then do;
  declare hash h(dataset:'DatasetA');
  h.definekey('ID','Incident_number');
  h.definedone();
 end;
set DatasetB;
if h.check()=0;
run;
Xia Keshan
Thank you very much xia keshan!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.