DATA Step, Macro, Functions and more

Simple Data Merge question

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Simple Data Merge question

[ Edited ]

I have to merge 2 datasets as below and need only number of rows which are in Dataset 2 (2 in this case)

Keys are column ID and B.

When i merge it gives me 3 rows as a result of many to many merge.

 

Data final;

 merge D1 (in=a) D2(in=b);

if b;

run;

 

 

Dataset 1

ID   B   C 

1    51  bb

1    51  bb

1    51  bb

 

Dataset 2

ID   B   D 

1    51  aa

1    51  aa

 

 

This is what i need in result.

Final

ID   B   C   D

1    51  bb aa

1    51  bb aa

 

 

 


Accepted Solutions
Solution
‎08-05-2016 02:56 PM
Super User
Posts: 5,099

Re: Simple Data Merge question

If there's no need to match on ID, a simplified version of an already-suggested solution would work:

 

data final;

set d1;

set d2;

run;

 

If you do need to match on ID, the program is more complex:

 

data final;

from_d1=0;

from_d2=0;

merge d1 (in=from_d1) d2 (in=from_db);

by id;

if from_d1 and from_d2;

run;

View solution in original post


All Replies
Super Contributor
Posts: 257

Re: Simple Data Merge question

Try:

 

Data final;

 merge D1 (in=a) D2(in=k);

if k;

run;

New Contributor
Posts: 4

Re: Simple Data Merge question

It returns 3 rows instead of 2 rows i need from dataset 2.

 

 

Contributor
Posts: 40

Re: Simple Data Merge question

My 2 cents:


data final;
set D1;
set D2;
by B;
run;
Super User
Posts: 10,550

Re: Simple Data Merge question

You may have been getting problems because of the use of in=B when you have a variable B.

 

For your example data this works:

Data d1;
   input ID   B   C $;
datalines;
1    51  bb
1    51  bb
1    52  bb
;
Data d2;
   input ID   B   D  $;
datalines;
1    51  aa
1    51  aa
;
run;

Data final;
   merge D1 (in=d1) D2(in=d2);
   by id b;
   if d2;
run;

 

But I suspect you may have inteded to use a BY statement to match on ID and B

Solution
‎08-05-2016 02:56 PM
Super User
Posts: 5,099

Re: Simple Data Merge question

If there's no need to match on ID, a simplified version of an already-suggested solution would work:

 

data final;

set d1;

set d2;

run;

 

If you do need to match on ID, the program is more complex:

 

data final;

from_d1=0;

from_d2=0;

merge d1 (in=from_d1) d2 (in=from_db);

by id;

if from_d1 and from_d2;

run;

New Contributor
Posts: 4

Re: Simple Data Merge question

Thank you so much.. this works. 

I am relatively new to SAS..would you mind sharing some more details of how initializing the tracking variables to 0 actually gives us the results needed here. Appreciate your help.

Super User
Posts: 5,099

Re: Simple Data Merge question

There's actually a two-part answer, none of which is particularly simple.

 

(1) When does SAS read a record from each source of data

(2) When does SAS set the in= variables to 0 or 1

 

The second question is the easier one.  SAS sets both in= variables to 0 when beginning a new value for the BY variable.  If a record gets read in from a data source, the matching in= variable gets set to 1.

 

When considering the first question, the biggest issue is that SAS reads each observation once.  Even in a many-to-one MERGE where one observation matches many observations from the "many" data set, SAS still reads each observation once.  The values read in may be retained (so that the "one" variables repeat for each of the "many" observations).  But each observation gets read in once.

 

So what happens in the sample program?  The program sets the in= variables to 0.  The program resets them to 1 only when it actually reads in an observation.  When one source has 5 observations and the other source has 3 observations ... the 5 observation data set has an observation read from it each time, and its in= variable gets set to 1 each time.  The 3 observation data set has an observation read in just 3 times, and so has its in= variable set to 1 just 3 times.  For the remaining 2 nonmatching observations, hard-coding the in= variables to 0 means that the 0 value remains in place.

 

The best way to view this might be to add 3 PUT statements:

 

put  from_d1= from_d2=;

 

Add this statement at 3 points in the DATA step:  (1) just after the DATA statement, (2) just before the MERGE statement, and (3) just after the BY statement.  That might help illustrate how the software sets, resets, and utilizes those variables.

New Contributor
Posts: 4

Re: Simple Data Merge question

You are simply awesome. Thanks for explaining that to me in detail.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 530 views
  • 3 likes
  • 5 in conversation