Hello,
I need to merge the DATASET 1 with DATASET 2 to get to a final dataset "DATA WANT" and I need some guidance.
please note that dataset 1 is a matched pair data of students and dataset 2 contains additional variables on the matched pairs data.
DATASET 1
| groupA_id | groupB_id | groupA_grade | groupB_grade | groupA_sex | groupB_sex |
| 00000-000001 | 00000-000200 | 2 | 2 | 1 | 1 |
| 00000-000001 | 00000-000201 | 2 | 2 | 1 | 1 |
| 00000-000001 | 00000-000202 | 2 | 2 | 1 | 1 |
| 00000-000002 | 00000-000203 | 2 | 2 | 1 | 1 |
| 00000-000002 | 00000-000204 | 2 | 2 | 1 | 1 |
| 00000-000002 | 00000-000205 | 2 | 2 | 1 | 1 |
| 00000-000003 | 00000-000206 | 2 | 2 | 1 | 1 |
| 00000-000003 | 00000-000207 | 2 | 2 | 1 | 1 |
| 00000-000003 | 00000-000208 | 2 | 2 | 1 | 1 |
| id | Siblings | economic status |
| 00000-000001 | Yes | low |
| 00000-000001 | yes | low |
| 00000-000001 | yes | low |
| 00000-000002 | yes | low |
| 00000-000002 | yes | low |
| 00000-000002 | yes | low |
| 00000-000003 | no | high |
| 00000-000003 | no | high |
| 00000-000003 | no | high |
| 00000-000200 | yes | low |
| 00000-000201 | no | low |
| 00000-000202 | no | high |
| 00000-000203 | no | medium |
| 00000-000204 | no | high |
| 00000-000205 | yes | low |
| 00000-000206 | no | medium |
| 00000-000207 | yes | high |
| 00000-000208 | yes | high |
DATA WANT
| id | grade | sex | Siblings | economic status |
| 00000-000001 | 2 | 1 | Yes | low |
| 00000-000002 | 2 | 1 | yes | low |
| 00000-000003 | 2 | 1 | no | high |
| 00000-000002 | 2 | 1 | yes | low |
| 00000-000200 | 2 | 1 | yes | low |
| 00000-000201 | 2 | 1 | no | low |
| 00000-000202 | 2 | 1 | no | high |
| 00000-000203 | 2 | 1 | no | medium |
| 00000-000204 | 2 | 1 | no | high |
| 00000-000205 | 2 | 1 | yes | low |
| 00000-000206 | 2 | 1 | no | medium |
| 00000-000207 | 2 | 1 | yes | high |
| 00000-000208 | 2 | 1 | yes | high |
Any help would be appreciated. Thanks
If I understand the problem correctly, this should work:
data dataset1_revised;
set dataset1;
id = groupA_id;
grade = groupA_grade;
sex = groupA_sex;
output;
id = groupB_id;
grade = groupB_grade;
sex = groupB_sex;
output;
keep id grade sex;
run;
The rest becomes a more commonplace sort + merge:
proc sort data=dataset1_revised nodupkey;
by id;
run;
proc sort data=dataset2 out=dataset2_revised nodupkey;
by id;
run;
data want;
merge dataset1_revised dataset2_revised;
by id;
run;
If I understand the problem correctly, this should work:
data dataset1_revised;
set dataset1;
id = groupA_id;
grade = groupA_grade;
sex = groupA_sex;
output;
id = groupB_id;
grade = groupB_grade;
sex = groupB_sex;
output;
keep id grade sex;
run;
The rest becomes a more commonplace sort + merge:
proc sort data=dataset1_revised nodupkey;
by id;
run;
proc sort data=dataset2 out=dataset2_revised nodupkey;
by id;
run;
data want;
merge dataset1_revised dataset2_revised;
by id;
run;
Many thanks Astounding!
This is what I wanted to accomplish. Note- I added a keep statement for dataset1_revised since the last statement did work only for dataset2_revised.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.