Hello,
I need to merge the DATASET 1 with DATASET 2 to get to a final dataset "DATA WANT" and I need some guidance.
please note that dataset 1 is a matched pair data of students and dataset 2 contains additional variables on the matched pairs data.
DATASET 1
groupA_id | groupB_id | groupA_grade | groupB_grade | groupA_sex | groupB_sex |
00000-000001 | 00000-000200 | 2 | 2 | 1 | 1 |
00000-000001 | 00000-000201 | 2 | 2 | 1 | 1 |
00000-000001 | 00000-000202 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000203 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000204 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000205 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000206 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000207 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000208 | 2 | 2 | 1 | 1 |
id | Siblings | economic status |
00000-000001 | Yes | low |
00000-000001 | yes | low |
00000-000001 | yes | low |
00000-000002 | yes | low |
00000-000002 | yes | low |
00000-000002 | yes | low |
00000-000003 | no | high |
00000-000003 | no | high |
00000-000003 | no | high |
00000-000200 | yes | low |
00000-000201 | no | low |
00000-000202 | no | high |
00000-000203 | no | medium |
00000-000204 | no | high |
00000-000205 | yes | low |
00000-000206 | no | medium |
00000-000207 | yes | high |
00000-000208 | yes | high |
DATA WANT
id | grade | sex | Siblings | economic status |
00000-000001 | 2 | 1 | Yes | low |
00000-000002 | 2 | 1 | yes | low |
00000-000003 | 2 | 1 | no | high |
00000-000002 | 2 | 1 | yes | low |
00000-000200 | 2 | 1 | yes | low |
00000-000201 | 2 | 1 | no | low |
00000-000202 | 2 | 1 | no | high |
00000-000203 | 2 | 1 | no | medium |
00000-000204 | 2 | 1 | no | high |
00000-000205 | 2 | 1 | yes | low |
00000-000206 | 2 | 1 | no | medium |
00000-000207 | 2 | 1 | yes | high |
00000-000208 | 2 | 1 | yes | high |
Any help would be appreciated. Thanks
Your sample output is inconsistent in terms of keeping ID duplicates because ID=00000-000002 is duplicated while the rest are not. Assuming that it's a typo and you don't want any dupes but only the first ID encountered in the data set 2, you can proceed as follows:
data want (drop = group: rc) ; if _n_ = 1 then do ; dcl hash h (ordered:"a") ; h.defineKey ("id") ; h.defineData ("id", "grade", "sex") ; h.defineDone () ; do until (z) ; set ds1 end = z ; id = groupa_id ; grade = groupa_grade ; sex = groupa_sex ; rc = h.add() ; id = groupb_id ; grade = groupb_grade ; sex = groupb_sex ; rc = h.add() ; end ; h.output (dataset:"hash") ; dcl hash u () ; u.defineKey ("id") ; u.defineDone () ; end ; set ds2 ; if u.check() ne 0 ; u.add() ; if h.find() ne 0 then call missing (grade, sex) ; run ;
If you want to keep the dupes from DS2, get rid of the hash table U and the related statements with the method calls referencing U.
Many thanks hashman
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.