Hello,
I need to merge the DATASET 1 with DATASET 2 to get to a final dataset "DATA WANT" and I need some guidance.
please note that dataset 1 is a matched pair data of students and dataset 2 contains additional variables on the matched pairs data.
DATASET 1
| groupA_id | groupB_id | groupA_grade | groupB_grade | groupA_sex | groupB_sex |
| 00000-000001 | 00000-000200 | 2 | 2 | 1 | 1 |
| 00000-000001 | 00000-000201 | 2 | 2 | 1 | 1 |
| 00000-000001 | 00000-000202 | 2 | 2 | 1 | 1 |
| 00000-000002 | 00000-000203 | 2 | 2 | 1 | 1 |
| 00000-000002 | 00000-000204 | 2 | 2 | 1 | 1 |
| 00000-000002 | 00000-000205 | 2 | 2 | 1 | 1 |
| 00000-000003 | 00000-000206 | 2 | 2 | 1 | 1 |
| 00000-000003 | 00000-000207 | 2 | 2 | 1 | 1 |
| 00000-000003 | 00000-000208 | 2 | 2 | 1 | 1 |
| id | Siblings | economic status |
| 00000-000001 | Yes | low |
| 00000-000001 | yes | low |
| 00000-000001 | yes | low |
| 00000-000002 | yes | low |
| 00000-000002 | yes | low |
| 00000-000002 | yes | low |
| 00000-000003 | no | high |
| 00000-000003 | no | high |
| 00000-000003 | no | high |
| 00000-000200 | yes | low |
| 00000-000201 | no | low |
| 00000-000202 | no | high |
| 00000-000203 | no | medium |
| 00000-000204 | no | high |
| 00000-000205 | yes | low |
| 00000-000206 | no | medium |
| 00000-000207 | yes | high |
| 00000-000208 | yes | high |
DATA WANT
| id | grade | sex | Siblings | economic status |
| 00000-000001 | 2 | 1 | Yes | low |
| 00000-000002 | 2 | 1 | yes | low |
| 00000-000003 | 2 | 1 | no | high |
| 00000-000002 | 2 | 1 | yes | low |
| 00000-000200 | 2 | 1 | yes | low |
| 00000-000201 | 2 | 1 | no | low |
| 00000-000202 | 2 | 1 | no | high |
| 00000-000203 | 2 | 1 | no | medium |
| 00000-000204 | 2 | 1 | no | high |
| 00000-000205 | 2 | 1 | yes | low |
| 00000-000206 | 2 | 1 | no | medium |
| 00000-000207 | 2 | 1 | yes | high |
| 00000-000208 | 2 | 1 | yes | high |
Any help would be appreciated. Thanks
Your sample output is inconsistent in terms of keeping ID duplicates because ID=00000-000002 is duplicated while the rest are not. Assuming that it's a typo and you don't want any dupes but only the first ID encountered in the data set 2, you can proceed as follows:
data want (drop = group: rc) ;
if _n_ = 1 then do ;
dcl hash h (ordered:"a") ;
h.defineKey ("id") ;
h.defineData ("id", "grade", "sex") ;
h.defineDone () ;
do until (z) ;
set ds1 end = z ;
id = groupa_id ; grade = groupa_grade ; sex = groupa_sex ; rc = h.add() ;
id = groupb_id ; grade = groupb_grade ; sex = groupb_sex ; rc = h.add() ;
end ;
h.output (dataset:"hash") ;
dcl hash u () ;
u.defineKey ("id") ;
u.defineDone () ;
end ;
set ds2 ;
if u.check() ne 0 ;
u.add() ;
if h.find() ne 0 then call missing (grade, sex) ;
run ; If you want to keep the dupes from DS2, get rid of the hash table U and the related statements with the method calls referencing U.
Many thanks hashman
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.