BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8

 

example 1
data application;
merge application static dynamic;
by app_ref_no
run;

example 2
data application;
merge application static;
by app_ref_no
run;

data application;
merge application dynamic;
by app_ref_no;
run;

is example 1 and example 2 going to give same results?

 

if yes does it apply to  more than one variable used for merge

eg replace in the merge process' by app_ref_no' with 'by app_ref_no org_code;

etc 

 

2 REPLIES 2
Tom
Super User Tom
Super User

 I cannot think of a situation where it would produce different results. 

Why don't you try it yourself and see?

 

But I would create different outputs instead of overwriting one of the inputs.

data test1;
  merge application static dynamic;
  by app_ref_no;
run;

data test2_a;
  merge application static;
  by app_ref_no;
run;

data test2;
  merge test2_a dynamic;
  by app_ref_no;
run;



Note that the number of BY variables used to define the BY groups does not matter.

But if the BY variables need to identify the groupings is different between the datasets then it might make a difference in how you merge them.

Say you have two key variables app_ref_no and org_code.  And some of the datasets have information at the app_ref_no level and others have it at the org_code level.  Then you might want to first merge the datasets at the org_code level first and then merge in the app_ref_no datasets so you can use different BY statements for the two merges.

FreelanceReinh
Jade | Level 19

@Tom wrote:

 I cannot think of a situation where it would produce different results. 


I couldn't think of one either, but creating generic datasets enlightened me: The results are different if

  1. There is a variable in dataset DYNAMIC which occurs in either APPLICATION or STATIC, but not both.
  2. The dataset not containing that variable has a BY group with more observations than in DYNAMIC, where the corresponding BY group, in turn, has at least as many observations as in the third dataset, but not zero.

Let na, ns and nd denote the numbers of observations in the relevant BY groups in APPLICATION, STATIC and DYNAMIC, respectively. Basically, there are four different cases satisfying the above criteria:

  1. Variable not contained in APPLICATION, n> nd ≥ ns > 0
  2. Variable not contained in APPLICATION, n> nd > ns = 0
  3. Variable not contained in STATIC, n> nd ≥ na > 0
  4. Variable not contained in STATIC, n> nd > na = 0

The app_ref_no BY groups 1, 2, 3, 4 in the sample datasets below represent small examples of these four cases.

data application;
input app_ref_no x :$3.;
cards;
1 ax1
1 ax2
2 ax3
2 ax4
3 ax5
;

data static;
input app_ref_no y :$3.;
cards;
1 sy1
3 sy2
3 sy3
4 sy4
4 sy5
;

data dynamic;
input app_ref_no x :$3. y :$3.;
cards;
1 dx1 dy1
2 dx2 dy2
3 dx3 dy3
4 dx4 dy4
;

Results:

TEST1 ("triple merge"):

Obs    ref_no     x      y

 1        1      dx1    dy1
 2        1      ax2    dy1
 3        2      dx2    dy2
 4        2      ax4    dy2
 5        3      dx3    dy3
 6        3      dx3    sy3
 7        4      dx4    dy4
 8        4      dx4    sy5

TEST2 ("two-stage merge"):

Obs    ref_no     x      y

 1        1      dx1    dy1
 2        1      ax2    sy1
 3        2      dx2    dy2
 4        2      ax4
 5        3      dx3    dy3
 6        3      ax5    sy3
 7        4      dx4    dy4
 8        4             sy5

 

In the "triple merge" the retained value (of the variable in question) from dataset DYNAMIC prevails in the last observation of each BY group because the other two datasets don't contribute any value. In the first step of the "two-stage merge", however, a dataset containing more values (missing or non-missing) than DYNAMIC is created and the surplus values prevail in the second merge step.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 712 views
  • 1 like
  • 3 in conversation