Hello,
I need to merge the DATASET 1 with DATASET 2 to get to a final dataset "DATA WANT" and I need some guidance.
please note that dataset 1 is a matched pair data of students and dataset 2 contains additional variables on the matched pairs data.
DATASET 1
groupA_id | groupB_id | groupA_grade | groupB_grade | groupA_sex | groupB_sex |
00000-000001 | 00000-000200 | 2 | 2 | 1 | 1 |
00000-000001 | 00000-000201 | 2 | 2 | 1 | 1 |
00000-000001 | 00000-000202 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000203 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000204 | 2 | 2 | 1 | 1 |
00000-000002 | 00000-000205 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000206 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000207 | 2 | 2 | 1 | 1 |
00000-000003 | 00000-000208 | 2 | 2 | 1 | 1 |
id | Siblings | economic status |
00000-000001 | Yes | low |
00000-000001 | yes | low |
00000-000001 | yes | low |
00000-000002 | yes | low |
00000-000002 | yes | low |
00000-000002 | yes | low |
00000-000003 | no | high |
00000-000003 | no | high |
00000-000003 | no | high |
00000-000200 | yes | low |
00000-000201 | no | low |
00000-000202 | no | high |
00000-000203 | no | medium |
00000-000204 | no | high |
00000-000205 | yes | low |
00000-000206 | no | medium |
00000-000207 | yes | high |
00000-000208 | yes | high |
DATA WANT
id | grade | sex | Siblings | economic status |
00000-000001 | 2 | 1 | Yes | low |
00000-000002 | 2 | 1 | yes | low |
00000-000003 | 2 | 1 | no | high |
00000-000002 | 2 | 1 | yes | low |
00000-000200 | 2 | 1 | yes | low |
00000-000201 | 2 | 1 | no | low |
00000-000202 | 2 | 1 | no | high |
00000-000203 | 2 | 1 | no | medium |
00000-000204 | 2 | 1 | no | high |
00000-000205 | 2 | 1 | yes | low |
00000-000206 | 2 | 1 | no | medium |
00000-000207 | 2 | 1 | yes | high |
00000-000208 | 2 | 1 | yes | high |
Any help would be appreciated. Thanks
If I understand the problem correctly, this should work:
data dataset1_revised;
set dataset1;
id = groupA_id;
grade = groupA_grade;
sex = groupA_sex;
output;
id = groupB_id;
grade = groupB_grade;
sex = groupB_sex;
output;
keep id grade sex;
run;
The rest becomes a more commonplace sort + merge:
proc sort data=dataset1_revised nodupkey;
by id;
run;
proc sort data=dataset2 out=dataset2_revised nodupkey;
by id;
run;
data want;
merge dataset1_revised dataset2_revised;
by id;
run;
If I understand the problem correctly, this should work:
data dataset1_revised;
set dataset1;
id = groupA_id;
grade = groupA_grade;
sex = groupA_sex;
output;
id = groupB_id;
grade = groupB_grade;
sex = groupB_sex;
output;
keep id grade sex;
run;
The rest becomes a more commonplace sort + merge:
proc sort data=dataset1_revised nodupkey;
by id;
run;
proc sort data=dataset2 out=dataset2_revised nodupkey;
by id;
run;
data want;
merge dataset1_revised dataset2_revised;
by id;
run;
Many thanks Astounding!
This is what I wanted to accomplish. Note- I added a keep statement for dataset1_revised since the last statement did work only for dataset2_revised.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.