DATA Step, Macro, Functions and more

Exact woking of Merge operation (IF A AND B)?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Exact woking of Merge operation (IF A AND B)?

Hi,

I am using the merge statement as follows(In SAS University edition):

 

 libname Temp '/folders/myfolders';

data Temp.merge_calls_2;
    merge Temp.trans_2 (in=jkl) Temp.trans_3 (in=iop);
    by col1 col2;

    if jkl and iop;
run;

proc print data=Temp.merge_calls_2;
    title Merge_Calls2;
run;

 

 

In some cases this logic provides me the output similar to SQL Inner join where as, sometimes the output is different.

Can some one pl. explain the exact working of the above merge operation and also a appropriate SQL query to get the same output .

 

Thanks In Advance.

CP.


Accepted Solutions
Solution
‎06-27-2016 08:47 AM
Super User
Posts: 6,936

Re: Exact woking of Merge operation (IF A AND B)?

In a many-to-many merge, SQL will create a cartesian join, while the datastep just goes through them sequentially.

eg dataset X has 3 obs for A = 1, while dataset Y has 2 obs for A = 1:

proc sql;
create table Z as 
select X.*
from X inner join Y
on X.A = Y.A
;
quit;

willl create dataset Z with 6 observations; all individual obs from X are joined with all individual obs from Y.

data Z;
merge
  X (in=in_x)
  Y (in=in_y)
;
by A;
if in_x and in_y;
run;

will create dataset Z with 3 observations.

With the data step, Z will have the following observations for A = 1:

1st: data from first obs from X and first obs from Y

2nd: data from second obs from X and second obs from Y

3rd: data from third obs from X and second obs from Y

Once the last record from a BY group is read from one of the participating datasets, its contents will stay in the PDV until no more records for the same BY group come from the other datasets.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Super User
Posts: 7,401

Re: Exact woking of Merge operation (IF A AND B)?

Yes, inner join is the nearest to that, see below graphic of the various joins.  Obviously the output depends on you data - i.e. multiple entries on either side and such like.  The datastep works by assigning a variable to each dataset which indicates which dataset it comes from, think of it like a new variable.  The if statement just checks that the merged record is present in both datasets.

Am not going over the full process, that is where the manual comes in:

http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001318494.htm

Or training:

http://www.ats.ucla.edu/stat/sas/modules/merge.htm

 

Capture.PNG

Occasional Contributor
Posts: 9

Re: Exact woking of Merge operation (IF A AND B)?

Hi,

the above mentioned link is good but i am not able to get the exact working of  

if jkl and iop

 

From the above posted code by me.

 

Thanks in advance.

CP.

Super User
Posts: 6,936

Re: Exact woking of Merge operation (IF A AND B)?

jkl and iop are two temporary variables (created throught the dataset options in the merge statement and not included in the output dataset(s)) that are set to 1 if a record from the respective dataset is present, and zero otherwise.

if jkl and iop;

is a so-called "subsetting if" and therefore means that if a record (for the current "by" value) is missing from either dataset, the current iteration of the data step is terminated and no output is written.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 17,819

Re: Exact woking of Merge operation (IF A AND B)?

What type,of merge are you trying to do, one to one, one to many, or many to many?

Typically these are what cause the differences between a data step and proc SQL and mess up how you think the IN and IF should work. 

Occasional Contributor
Posts: 9

Re: Exact woking of Merge operation (IF A AND B)?

Hi,

   I think its Many-to-many merging because there are duplicates in both data set.

Solution
‎06-27-2016 08:47 AM
Super User
Posts: 6,936

Re: Exact woking of Merge operation (IF A AND B)?

In a many-to-many merge, SQL will create a cartesian join, while the datastep just goes through them sequentially.

eg dataset X has 3 obs for A = 1, while dataset Y has 2 obs for A = 1:

proc sql;
create table Z as 
select X.*
from X inner join Y
on X.A = Y.A
;
quit;

willl create dataset Z with 6 observations; all individual obs from X are joined with all individual obs from Y.

data Z;
merge
  X (in=in_x)
  Y (in=in_y)
;
by A;
if in_x and in_y;
run;

will create dataset Z with 3 observations.

With the data step, Z will have the following observations for A = 1:

1st: data from first obs from X and first obs from Y

2nd: data from second obs from X and second obs from Y

3rd: data from third obs from X and second obs from Y

Once the last record from a BY group is read from one of the participating datasets, its contents will stay in the PDV until no more records for the same BY group come from the other datasets.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 9

Re: Exact woking of Merge operation (IF A AND B)?

Thank you for your reply,

Just one more query. Can you provide equivalent SQL query to get the same output.

 

Thanks in advance.

CP.

Super User
Posts: 6,936

Re: Exact woking of Merge operation (IF A AND B)?

From where does your obsession with SQL come from? If the data step does what you want, stay with it.

(Answer to your question: none, as there isn't one. At least one that makes sense. SQL behaves differently, period)

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 17,819

Re: Exact woking of Merge operation (IF A AND B)?

A many to many merge in a data step does not give you the same results as using a SQL join. 

 

Usually I prefer a SQL step for these joins as the results are more intuitive. 

 

Here's a SAS note regarding this:

 

http://support.sas.com/kb/24/752.html

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 398 views
  • 7 likes
  • 4 in conversation