- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data tst1;
input a $ yr b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;
data tst2;
input a $ yr c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;
proc sort data = tst1 ; by a;run;
proc sort data = tst2 ; by a;run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a;
if x then output;
run;
When I run this the result is
a yr b c
a 2000 1 10
b 2003 2 13
b 2003 3 13
c 1996 4 .
d 2000 3 13
Why am I getting the same year value for b? year corresponding to b =2 in tst1 is 2000
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@riyaaora275 wrote:
data tst1; input a $ yr b; datalines ; a 2000 1 b 2000 2 b 2003 3 c 1996 4 d 2013 3 ; run; data tst2; input a $ yr c; datalines ; a 2000 10 b 2003 13 d 2000 13 ; run; proc sort data = tst1 ; by a;run; proc sort data = tst2 ; by a;run; data tst3; merge tst1 (in=x) tst2 (in=y); by a; if x then output; run;
When I run this the result is
a yr b c
a 2000 1 10
b 2003 2 13
b 2003 3 13
c 1996 4 .
d 2000 3 13
Why am I getting the same year value for b? year corresponding to b =2 in tst1 is 2000
You do NOT merge datasets that have variables in common other than those in the by statement.
When encountering the value "b" for variable a, this happens:
- The PDV is filled with data from tst1, and looks like this:
b 2000 2 - the matching observation from dataset tst2 is read, and yr is overwritten:
b 2003 2 13 - the PDV is written to tst3
- the next observation from tst1 is read, and the PDV looks like this:
b 2003 3 13
Note that 2003 is now the value from tst1, and variable c is retained (all variables from input datasets are automatically retained) - since no further matching observations are present in tst2, no read is done there
- the PDV is written to tst3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Because you only have yr=2003 for b=2 in tst2. I think this gives you your desired result?
data tst1;
input a $ yr b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;
data tst2;
input a $ yr c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;
proc sort data = tst1 ; by a yr; run;
proc sort data = tst2 ; by a yr; run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a yr;
if x then output;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@riyaaora275 wrote:
data tst1; input a $ yr b; datalines ; a 2000 1 b 2000 2 b 2003 3 c 1996 4 d 2013 3 ; run; data tst2; input a $ yr c; datalines ; a 2000 10 b 2003 13 d 2000 13 ; run; proc sort data = tst1 ; by a;run; proc sort data = tst2 ; by a;run; data tst3; merge tst1 (in=x) tst2 (in=y); by a; if x then output; run;
When I run this the result is
a yr b c
a 2000 1 10
b 2003 2 13
b 2003 3 13
c 1996 4 .
d 2000 3 13
Why am I getting the same year value for b? year corresponding to b =2 in tst1 is 2000
You do NOT merge datasets that have variables in common other than those in the by statement.
When encountering the value "b" for variable a, this happens:
- The PDV is filled with data from tst1, and looks like this:
b 2000 2 - the matching observation from dataset tst2 is read, and yr is overwritten:
b 2003 2 13 - the PDV is written to tst3
- the next observation from tst1 is read, and the PDV looks like this:
b 2003 3 13
Note that 2003 is now the value from tst1, and variable c is retained (all variables from input datasets are automatically retained) - since no further matching observations are present in tst2, no read is done there
- the PDV is written to tst3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@riyaaora275: I recommend setting the MSGLEVEL= system option to I (e.g. in your autoexec file):
options msglevel=I;
Then you'll get messages in the log such as
INFO: The variable yr on data set WORK.TST1 will be overwritten by data set WORK.TST2.
which will help you to understand the results in situations like this.