- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I'm trying to merge 2 data sets
data one;
input id $ year ;
datalines;
a 1
a 2
;
run;
data two;
input id $ sc pt;
datalines;
a 1 50
a 2 60
a 3 40
;
run;
This is the code I used
proc sort data= one;by id;run;
proc sort data= two;by id;run;
data one_two;
merge one(in=a) two(in=b);
if a and b;
by id;
run;
This is the output I got
id | year | sc | pt |
---|---|---|---|
a | 1 | 1 | 50 |
a | 2 | 2 | 60 |
a | 2 | 3 | 40 |
Instead, How do I get this output
id | year | sc | pt |
---|---|---|---|
a | 1 | 1 | 50 |
a | 2 | 2 | 60 |
a | . | 3 | 40 |
THANK YOU!!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When doing a MANY to MANY merge with a data step SAS will match the records one by one until one of the datasets does not have enough records in the group. The values from that dataset are then retained onto the rest of the records in the group.
To get what you want you just need to add an OUTPUT and CALL MISSING to the end of the data step.
data one_two;
merge one(in=a) two(in=b);
by id;
if a and b;
output;
call missing(of _all_) ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You can't get that output in a secure way unless you can't define an extended primary key. Is year and sc related?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
When doing a MANY to MANY merge with a data step SAS will match the records one by one until one of the datasets does not have enough records in the group. The values from that dataset are then retained onto the rest of the records in the group.
To get what you want you just need to add an OUTPUT and CALL MISSING to the end of the data step.
data one_two;
merge one(in=a) two(in=b);
by id;
if a and b;
output;
call missing(of _all_) ;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, another way to get the desired output for those small data sets is to get rid of the BY statement .(by the way, you don't need a RUN statement after a DATALINES file) ...
data one;
input id $ year @@;
datalines;
a 1 a 2
;
data two;
input id $ sc pt @@;
datalines;
a 1 50 a 2 60 a 3 40
;
data one_two;
merge one two;
run;
DATA SET ONE-TWO ...
Obs id year sc pt
1 a 1 1 50
2 a 2 2 60
3 a . 3 40
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Code: Program
data one;
input id $ year @@;
datalines;
a 1 a 2
;
data two;
input id $ sc pt @@;
datalines;
a 1 50 a 2 60 a 3 40
;
data one;
set one;
sc=year;
run;
data want;
merge one two;
by id sc;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Code: Program
data one;
input id $ year @@;
datalines;
a 1 a 2
;
data two;
input id $ sc pt @@;
datalines;
a 1 50 a 2 60 a 3 40
;
run;
/****Alternative Way*****/
proc sql;
create table _want as
select b.*,a.year
from one as a right join two as b
on a.id=b.id and a.year=b.sc;
quit;