SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
riyaaora275
Obsidian | Level 7
data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by   a;run;
proc sort data = tst2 ; by  a;run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a;
if x  then output;
run;

 

 

When I run this the result is 

a   yr     b   c

a 2000 1 10

b 2003 2 13

b 2003 3 13

c 1996 4  .

d 2000 3 13

 

Why am I  getting the same year value for b? year corresponding to b =2 in tst1 is 2000

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@riyaaora275 wrote:
data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by   a;run;
proc sort data = tst2 ; by  a;run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a;
if x  then output;
run;

 

 

When I run this the result is 

a   yr     b   c

a 2000 1 10

b 2003 2 13

b 2003 3 13

c 1996 4  .

d 2000 3 13

 

Why am I  getting the same year value for b? year corresponding to b =2 in tst1 is 2000


You do NOT merge datasets that have variables in common other than those in the by statement.

When encountering the value "b" for variable a, this happens:

  • The PDV is filled with data from tst1, and looks like this:
    b 2000 2
  • the matching observation from dataset tst2 is read, and yr is overwritten:
    b 2003 2 13
  • the PDV is written to tst3
  • the next observation from tst1 is read, and the PDV looks like this:
    b 2003 3 13
    Note that 2003 is now the value from tst1, and variable c is retained (all variables from input datasets are automatically retained)
  • since no further matching observations are present in tst2, no read is done there
  • the PDV is written to tst3

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

Because you only have yr=2003 for b=2 in tst2. I think this gives you your desired result?

 

data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by a yr; run;
proc sort data = tst2 ; by a yr; run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a yr;
if x then output;
run;
Astounding
PROC Star
Google and read an old paper: How Merge Really Works
riyaaora275
Obsidian | Level 7
Thanks a lot.
Kurt_Bremser
Super User

@riyaaora275 wrote:
data tst1;
input a $ yr  b;
datalines ;
a 2000 1
b 2000 2
b 2003 3
c 1996 4
d 2013 3
;
run;

data tst2;
input a $ yr  c;
datalines ;
a 2000 10
b 2003 13
d 2000 13
;
run;


proc sort data = tst1 ; by   a;run;
proc sort data = tst2 ; by  a;run;
data tst3;
merge tst1 (in=x) tst2 (in=y);
by a;
if x  then output;
run;

 

 

When I run this the result is 

a   yr     b   c

a 2000 1 10

b 2003 2 13

b 2003 3 13

c 1996 4  .

d 2000 3 13

 

Why am I  getting the same year value for b? year corresponding to b =2 in tst1 is 2000


You do NOT merge datasets that have variables in common other than those in the by statement.

When encountering the value "b" for variable a, this happens:

  • The PDV is filled with data from tst1, and looks like this:
    b 2000 2
  • the matching observation from dataset tst2 is read, and yr is overwritten:
    b 2003 2 13
  • the PDV is written to tst3
  • the next observation from tst1 is read, and the PDV looks like this:
    b 2003 3 13
    Note that 2003 is now the value from tst1, and variable c is retained (all variables from input datasets are automatically retained)
  • since no further matching observations are present in tst2, no read is done there
  • the PDV is written to tst3
FreelanceReinh
Jade | Level 19

@riyaaora275: I recommend setting the MSGLEVEL= system option to I (e.g. in your autoexec file):

options msglevel=I;

Then you'll get messages in the log such as

INFO: The variable yr on data set WORK.TST1 will be overwritten by data set WORK.TST2.

which will help you to understand the results in situations like this.

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1738 views
  • 0 likes
  • 5 in conversation