Hello,
I need to merge the DATASET 1 with DATASET 2 to get to a final dataset "DATA WANT" and I need some guidance.
please note that dataset 1 is a matched pair data of students and dataset 2 contains additional variables on the matched pairs data.
DATASET 1
groupA_id | groupB_id | groupA_grade | groupB_grade | groupA_sex | groupB_sex |
00000-000001 | 00000-000200 | 2 | 2 | 1 | 1 |
00000-000001 | 00000-000201 | 2 | 2 | 1 | 1 |
00000-000001 | 00000-000202 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000203 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000204 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000205 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000206 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000207 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000208 | 2 | 2 | 1 | 1 |
id | Siblings | economic status |
00000-000001 | Yes | low |
00000-000001 | yes | low |
00000-000001 | yes | low |
00000-000002 | yes | low |
00000-000002 | yes | low |
00000-000002 | yes | low |
00000-000003 | no | high |
00000-000003 | no | high |
00000-000003 | no | high |
00000-000200 | yes | low |
00000-000201 | no | low |
00000-000202 | no | high |
00000-000203 | no | medium |
00000-000204 | no | high |
00000-000205 | yes | low |
00000-000206 | no | medium |
00000-000207 | yes | high |
00000-000208 | yes | high |
DATA WANT
id | grade | sex | Siblings | economic status |
00000-000001 | 2 | 1 | Yes | low |
00000-000002 | 2 | 1 | yes | low |
00000-000003 | 2 | 1 | no | high |
00000-000002 | 2 | 1 | yes | low |
00000-000200 | 2 | 1 | yes | low |
00000-000201 | 2 | 1 | no | low |
00000-000202 | 2 | 1 | no | high |
00000-000203 | 2 | 1 | no | medium |
00000-000204 | 2 | 1 | no | high |
00000-000205 | 2 | 1 | yes | low |
00000-000206 | 2 | 1 | no | medium |
00000-000207 | 2 | 1 | yes | high |
00000-000208 | 2 | 1 | yes | high |
Any help would be appreciated. Thanks
Your sample output is inconsistent in terms of keeping ID duplicates because ID=00000-000002 is duplicated while the rest are not. Assuming that it's a typo and you don't want any dupes but only the first ID encountered in the data set 2, you can proceed as follows:
data want (drop = group: rc) ; if _n_ = 1 then do ; dcl hash h (ordered:"a") ; h.defineKey ("id") ; h.defineData ("id", "grade", "sex") ; h.defineDone () ; do until (z) ; set ds1 end = z ; id = groupa_id ; grade = groupa_grade ; sex = groupa_sex ; rc = h.add() ; id = groupb_id ; grade = groupb_grade ; sex = groupb_sex ; rc = h.add() ; end ; h.output (dataset:"hash") ; dcl hash u () ; u.defineKey ("id") ; u.defineDone () ; end ; set ds2 ; if u.check() ne 0 ; u.add() ; if h.find() ne 0 then call missing (grade, sex) ; run ;
If you want to keep the dupes from DS2, get rid of the hash table U and the related statements with the method calls referencing U.
Many thanks hashman
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.