Hi,
I have two datasets in long format that have variables “ID” and “incident_number” in common. Each person has a unique ID number but two or more people can have the same incident_number if they violate school policy together (for example, both of them engaged in underage drinking at the same party; so the underage drinking at the party would count as one incident).
If DatasetB doesn’t have a matching ID and incident number with DatasetA, I want to delete unmatched cases in DatasetB. So, I want to select cases based on a combination of ID and incident number.
Is there a way to modify DatasetB in this way?
Have DatasetA:
ID | Incident_number | VarA | VarB |
1 | 56789 | Abcd | 1232 |
1 | 56789 | Erfsdjdgr | 94847 |
2 | 64908 | Fakjsdf;asie | 5342 |
2 | 56789 | Sajfldkjsa;ljf | 54698734 |
3 | 78621 | Kjdafjdh | 0589576 |
4 | 78621 | afsdfasdkljfa | 1154 |
4 | 78721 | Fsafd | 987888 |
Have DatasetB:
ID | Incident_number | VarX | VarY |
1 | 56789 | 56789 | galdjfoia |
1 | 56789 | 56789 | sadjfakhd |
1 | 56818 | 56818 | weoiatlk |
2 | 56789 | 124363 | Abjdkafh |
2 | 64826 | 5463 | Giagaojdfl |
3 | 78629 | 1545744654 | Ajfu |
3 | 78799 | 1265841 | Fkajhdfajh |
4 | 78621 | 9378464 | Nadfaei |
4 | 79725 | 5486513541 | Lirueoi |
9 | 79102 | 23756 | Nann |
12 | 80215 | 987543 | Ueyrf |
Want new_DatasetB
ID | Incident | VarX | VarY |
1 | 56789 | 56789 | galdjfoia |
1 | 56789 | 56789 | sadjfakhd |
2 | 56789 | 124363 | Abjdkafh |
4 | 78621 | 9378464 | Nadfaei |
Thank you for your help.
Hash Table ?
data DatasetA; input ID Incident_number VarA $ VarB ; cards4; 1 56789 Abcd 1232 1 56789 Erfsdjdgr 94847 2 64908 Fakjsdf;asie 5342 2 56789 Sajfldkjsa;ljf 54698734 3 78621 Kjdafjdh 0589576 4 78621 afsdfasdkljfa 1154 4 78721 Fsafd 987888 ;;;; run; data DatasetB; input ID Incident_number VarX VarY $; cards4; 1 56789 56789 galdjfoia 1 56789 56789 sadjfakhd 1 56818 56818 weoiatlk 2 56789 124363 Abjdkafh 2 64826 5463 Giagaojdfl 3 78629 1545744654 Ajfu 3 78799 1265841 Fkajhdfajh 4 78621 9378464 Nadfaei 4 79725 5486513541 Lirueoi 9 79102 23756 Nann 12 80215 987543 Ueyrf ;;;; run; data want; if _n_ eq 1 then do; declare hash h(dataset:'DatasetA'); h.definekey('ID','Incident_number'); h.definedone(); end; set DatasetB; if h.check()=0; run;
Xia Keshan
Try this,
proc sort data=a; by ID Incident_number; run;
proc sort data=b; by ID Incident_number; run;
data want; merge a(in=a) b(in=b); by ID Incident_number; if a=b; drop VarA VarB; run;
proc print noobs; run;
Thank you K C Wong. Is there any way of achieving the same result without merging the datasets?
Hash Table ?
data DatasetA; input ID Incident_number VarA $ VarB ; cards4; 1 56789 Abcd 1232 1 56789 Erfsdjdgr 94847 2 64908 Fakjsdf;asie 5342 2 56789 Sajfldkjsa;ljf 54698734 3 78621 Kjdafjdh 0589576 4 78621 afsdfasdkljfa 1154 4 78721 Fsafd 987888 ;;;; run; data DatasetB; input ID Incident_number VarX VarY $; cards4; 1 56789 56789 galdjfoia 1 56789 56789 sadjfakhd 1 56818 56818 weoiatlk 2 56789 124363 Abjdkafh 2 64826 5463 Giagaojdfl 3 78629 1545744654 Ajfu 3 78799 1265841 Fkajhdfajh 4 78621 9378464 Nadfaei 4 79725 5486513541 Lirueoi 9 79102 23756 Nann 12 80215 987543 Ueyrf ;;;; run; data want; if _n_ eq 1 then do; declare hash h(dataset:'DatasetA'); h.definekey('ID','Incident_number'); h.definedone(); end; set DatasetB; if h.check()=0; run;
Xia Keshan
Thank you very much xia keshan!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.