Hi,
I have two datasets in long format that have variables “ID” and “incident_number” in common. Each person has a unique ID number but two or more people can have the same incident_number if they violate school policy together (for example, both of them engaged in underage drinking at the same party; so the underage drinking at the party would count as one incident).
If DatasetB doesn’t have a matching ID and incident number with DatasetA, I want to delete unmatched cases in DatasetB. So, I want to select cases based on a combination of ID and incident number.
Is there a way to modify DatasetB in this way?
Have DatasetA:
ID | Incident_number | VarA | VarB |
1 | 56789 | Abcd | 1232 |
1 | 56789 | Erfsdjdgr | 94847 |
2 | 64908 | Fakjsdf;asie | 5342 |
2 | 56789 | Sajfldkjsa;ljf | 54698734 |
3 | 78621 | Kjdafjdh | 0589576 |
4 | 78621 | afsdfasdkljfa | 1154 |
4 | 78721 | Fsafd | 987888 |
Have DatasetB:
ID | Incident_number | VarX | VarY |
1 | 56789 | 56789 | galdjfoia |
1 | 56789 | 56789 | sadjfakhd |
1 | 56818 | 56818 | weoiatlk |
2 | 56789 | 124363 | Abjdkafh |
2 | 64826 | 5463 | Giagaojdfl |
3 | 78629 | 1545744654 | Ajfu |
3 | 78799 | 1265841 | Fkajhdfajh |
4 | 78621 | 9378464 | Nadfaei |
4 | 79725 | 5486513541 | Lirueoi |
9 | 79102 | 23756 | Nann |
12 | 80215 | 987543 | Ueyrf |
Want new_DatasetB
ID | Incident | VarX | VarY |
1 | 56789 | 56789 | galdjfoia |
1 | 56789 | 56789 | sadjfakhd |
2 | 56789 | 124363 | Abjdkafh |
4 | 78621 | 9378464 | Nadfaei |
Thank you for your help.
Hash Table ?
data DatasetA;
input
ID
Incident_number
VarA $
VarB ;
cards4;
1
56789
Abcd
1232
1
56789
Erfsdjdgr
94847
2
64908
Fakjsdf;asie
5342
2
56789
Sajfldkjsa;ljf
54698734
3
78621
Kjdafjdh
0589576
4
78621
afsdfasdkljfa
1154
4
78721
Fsafd
987888
;;;;
run;
data DatasetB;
input
ID
Incident_number
VarX
VarY $;
cards4;
1
56789
56789
galdjfoia
1
56789
56789
sadjfakhd
1
56818
56818
weoiatlk
2
56789
124363
Abjdkafh
2
64826
5463
Giagaojdfl
3
78629
1545744654
Ajfu
3
78799
1265841
Fkajhdfajh
4
78621
9378464
Nadfaei
4
79725
5486513541
Lirueoi
9
79102
23756
Nann
12
80215
987543
Ueyrf
;;;;
run;
data want;
if _n_ eq 1 then do;
declare hash h(dataset:'DatasetA');
h.definekey('ID','Incident_number');
h.definedone();
end;
set DatasetB;
if h.check()=0;
run;
Xia Keshan
Try this,
proc sort data=a; by ID Incident_number; run;
proc sort data=b; by ID Incident_number; run;
data want; merge a(in=a) b(in=b); by ID Incident_number; if a=b; drop VarA VarB; run;
proc print noobs; run;
Thank you K C Wong. Is there any way of achieving the same result without merging the datasets?
Hash Table ?
data DatasetA;
input
ID
Incident_number
VarA $
VarB ;
cards4;
1
56789
Abcd
1232
1
56789
Erfsdjdgr
94847
2
64908
Fakjsdf;asie
5342
2
56789
Sajfldkjsa;ljf
54698734
3
78621
Kjdafjdh
0589576
4
78621
afsdfasdkljfa
1154
4
78721
Fsafd
987888
;;;;
run;
data DatasetB;
input
ID
Incident_number
VarX
VarY $;
cards4;
1
56789
56789
galdjfoia
1
56789
56789
sadjfakhd
1
56818
56818
weoiatlk
2
56789
124363
Abjdkafh
2
64826
5463
Giagaojdfl
3
78629
1545744654
Ajfu
3
78799
1265841
Fkajhdfajh
4
78621
9378464
Nadfaei
4
79725
5486513541
Lirueoi
9
79102
23756
Nann
12
80215
987543
Ueyrf
;;;;
run;
data want;
if _n_ eq 1 then do;
declare hash h(dataset:'DatasetA');
h.definekey('ID','Incident_number');
h.definedone();
end;
set DatasetB;
if h.check()=0;
run;
Xia Keshan
Thank you very much xia keshan!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.