BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
riyaaora275
Obsidian | Level 7
data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by   a;run;
proc sort data = tst2 ; by  a;run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a;
if x  then output;
run;

 

 

When I run this the result is 

a   yr     b   c

a 2000 1 10

b 2003 2 13

b 2003 3 13

c 1996 4  .

d 2000 3 13

 

Why am I  getting the same year value for b? year corresponding to b =2 in tst1 is 2000

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@riyaaora275 wrote:
data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by   a;run;
proc sort data = tst2 ; by  a;run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a;
if x  then output;
run;

 

 

When I run this the result is 

a   yr     b   c

a 2000 1 10

b 2003 2 13

b 2003 3 13

c 1996 4  .

d 2000 3 13

 

Why am I  getting the same year value for b? year corresponding to b =2 in tst1 is 2000


You do NOT merge datasets that have variables in common other than those in the by statement.

When encountering the value "b" for variable a, this happens:

  • The PDV is filled with data from tst1, and looks like this:
    b 2000 2
  • the matching observation from dataset tst2 is read, and yr is overwritten:
    b 2003 2 13
  • the PDV is written to tst3
  • the next observation from tst1 is read, and the PDV looks like this:
    b 2003 3 13
    Note that 2003 is now the value from tst1, and variable c is retained (all variables from input datasets are automatically retained)
  • since no further matching observations are present in tst2, no read is done there
  • the PDV is written to tst3

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Because you only have yr=2003 for b=2 in tst2. I think this gives you your desired result?

 

data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by a yr; run;
proc sort data = tst2 ; by a yr; run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a yr;
if x then output;
run;
Astounding
PROC Star
Google and read an old paper: How Merge Really Works
riyaaora275
Obsidian | Level 7
Thanks a lot.
Kurt_Bremser
Super User

@riyaaora275 wrote:
data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by   a;run;
proc sort data = tst2 ; by  a;run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a;
if x  then output;
run;

 

 

When I run this the result is 

a   yr     b   c

a 2000 1 10

b 2003 2 13

b 2003 3 13

c 1996 4  .

d 2000 3 13

 

Why am I  getting the same year value for b? year corresponding to b =2 in tst1 is 2000


You do NOT merge datasets that have variables in common other than those in the by statement.

When encountering the value "b" for variable a, this happens:

  • The PDV is filled with data from tst1, and looks like this:
    b 2000 2
  • the matching observation from dataset tst2 is read, and yr is overwritten:
    b 2003 2 13
  • the PDV is written to tst3
  • the next observation from tst1 is read, and the PDV looks like this:
    b 2003 3 13
    Note that 2003 is now the value from tst1, and variable c is retained (all variables from input datasets are automatically retained)
  • since no further matching observations are present in tst2, no read is done there
  • the PDV is written to tst3
FreelanceReinh
Jade | Level 19

@riyaaora275: I recommend setting the MSGLEVEL= system option to I (e.g. in your autoexec file):

options msglevel=I;

Then you'll get messages in the log such as

INFO: The variable yr on data set WORK.TST1 will be overwritten by data set WORK.TST2.

which will help you to understand the results in situations like this.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 997 views
  • 0 likes
  • 5 in conversation