BookmarkSubscribeRSS Feed
HeatherNewton
Quartz | Level 8

 

example 1
data application;
merge application static dynamic;
by app_ref_no
run;

example 2
data application;
merge application static;
by app_ref_no
run;

data application;
merge application dynamic;
by app_ref_no;
run;

is example 1 and example 2 going to give same results?

 

if yes does it apply to  more than one variable used for merge

eg replace in the merge process' by app_ref_no' with 'by app_ref_no org_code;

etc 

 

2 REPLIES 2
Tom
Super User Tom
Super User

 I cannot think of a situation where it would produce different results. 

Why don't you try it yourself and see?

 

But I would create different outputs instead of overwriting one of the inputs.

data test1;
  merge application static dynamic;
  by app_ref_no;
run;

data test2_a;
  merge application static;
  by app_ref_no;
run;

data test2;
  merge test2_a dynamic;
  by app_ref_no;
run;



Note that the number of BY variables used to define the BY groups does not matter.

But if the BY variables need to identify the groupings is different between the datasets then it might make a difference in how you merge them.

Say you have two key variables app_ref_no and org_code.  And some of the datasets have information at the app_ref_no level and others have it at the org_code level.  Then you might want to first merge the datasets at the org_code level first and then merge in the app_ref_no datasets so you can use different BY statements for the two merges.

FreelanceReinh
Jade | Level 19

@Tom wrote:

 I cannot think of a situation where it would produce different results. 


I couldn't think of one either, but creating generic datasets enlightened me: The results are different if

  1. There is a variable in dataset DYNAMIC which occurs in either APPLICATION or STATIC, but not both.
  2. The dataset not containing that variable has a BY group with more observations than in DYNAMIC, where the corresponding BY group, in turn, has at least as many observations as in the third dataset, but not zero.

Let na, ns and nd denote the numbers of observations in the relevant BY groups in APPLICATION, STATIC and DYNAMIC, respectively. Basically, there are four different cases satisfying the above criteria:

  1. Variable not contained in APPLICATION, n> nd ≥ ns > 0
  2. Variable not contained in APPLICATION, n> nd > ns = 0
  3. Variable not contained in STATIC, n> nd ≥ na > 0
  4. Variable not contained in STATIC, n> nd > na = 0

The app_ref_no BY groups 1, 2, 3, 4 in the sample datasets below represent small examples of these four cases.

data application;
input app_ref_no x :$3.;
cards;
1 ax1
1 ax2
2 ax3
2 ax4
3 ax5
;

data static;
input app_ref_no y :$3.;
cards;
1 sy1
3 sy2
3 sy3
4 sy4
4 sy5
;

data dynamic;
input app_ref_no x :$3. y :$3.;
cards;
1 dx1 dy1
2 dx2 dy2
3 dx3 dy3
4 dx4 dy4
;

Results:

TEST1 ("triple merge"):

Obs    ref_no     x      y

 1        1      dx1    dy1
 2        1      ax2    dy1
 3        2      dx2    dy2
 4        2      ax4    dy2
 5        3      dx3    dy3
 6        3      dx3    sy3
 7        4      dx4    dy4
 8        4      dx4    sy5

TEST2 ("two-stage merge"):

Obs    ref_no     x      y

 1        1      dx1    dy1
 2        1      ax2    sy1
 3        2      dx2    dy2
 4        2      ax4
 5        3      dx3    dy3
 6        3      ax5    sy3
 7        4      dx4    dy4
 8        4             sy5

 

In the "triple merge" the retained value (of the variable in question) from dataset DYNAMIC prevails in the last observation of each BY group because the other two datasets don't contribute any value. In the first step of the "two-stage merge", however, a dataset containing more values (missing or non-missing) than DYNAMIC is created and the surplus values prevail in the second merge step.

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 206 views
  • 1 like
  • 3 in conversation