BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
deepanshub
Fluorite | Level 6

I have to merge 2 datasets as below and need only number of rows which are in Dataset 2 (2 in this case)

Keys are column ID and B.

When i merge it gives me 3 rows as a result of many to many merge.

 

Data final;

 merge D1 (in=a) D2(in=b);

if b;

run;

 

 

Dataset 1

ID   B   C 

1    51  bb

1    51  bb

1    51  bb

 

Dataset 2

ID   B   D 

1    51  aa

1    51  aa

 

 

This is what i need in result.

Final

ID   B   C   D

1    51  bb aa

1    51  bb aa

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

If there's no need to match on ID, a simplified version of an already-suggested solution would work:

 

data final;

set d1;

set d2;

run;

 

If you do need to match on ID, the program is more complex:

 

data final;

from_d1=0;

from_d2=0;

merge d1 (in=from_d1) d2 (in=from_db);

by id;

if from_d1 and from_d2;

run;

View solution in original post

9 REPLIES 9
KachiM
Rhodochrosite | Level 12

Try:

 

Data final;

 merge D1 (in=a) D2(in=k);

if k;

run;

deepanshub
Fluorite | Level 6

It returns 3 rows instead of 2 rows i need from dataset 2.

 

 

jj02148
Obsidian | Level 7
My 2 cents:


data final;
set D1;
set D2;
by B;
run;
ballardw
Super User

You may have been getting problems because of the use of in=B when you have a variable B.

 

For your example data this works:

Data d1;
   input ID   B   C $;
datalines;
1    51  bb
1    51  bb
1    52  bb
;
Data d2;
   input ID   B   D  $;
datalines;
1    51  aa
1    51  aa
;
run;

Data final;
   merge D1 (in=d1) D2(in=d2);
   by id b;
   if d2;
run;

 

But I suspect you may have inteded to use a BY statement to match on ID and B

Astounding
PROC Star

If there's no need to match on ID, a simplified version of an already-suggested solution would work:

 

data final;

set d1;

set d2;

run;

 

If you do need to match on ID, the program is more complex:

 

data final;

from_d1=0;

from_d2=0;

merge d1 (in=from_d1) d2 (in=from_db);

by id;

if from_d1 and from_d2;

run;

deepanshub
Fluorite | Level 6

Thank you so much.. this works. 

I am relatively new to SAS..would you mind sharing some more details of how initializing the tracking variables to 0 actually gives us the results needed here. Appreciate your help.

Astounding
PROC Star

There's actually a two-part answer, none of which is particularly simple.

 

(1) When does SAS read a record from each source of data

(2) When does SAS set the in= variables to 0 or 1

 

The second question is the easier one.  SAS sets both in= variables to 0 when beginning a new value for the BY variable.  If a record gets read in from a data source, the matching in= variable gets set to 1.

 

When considering the first question, the biggest issue is that SAS reads each observation once.  Even in a many-to-one MERGE where one observation matches many observations from the "many" data set, SAS still reads each observation once.  The values read in may be retained (so that the "one" variables repeat for each of the "many" observations).  But each observation gets read in once.

 

So what happens in the sample program?  The program sets the in= variables to 0.  The program resets them to 1 only when it actually reads in an observation.  When one source has 5 observations and the other source has 3 observations ... the 5 observation data set has an observation read from it each time, and its in= variable gets set to 1 each time.  The 3 observation data set has an observation read in just 3 times, and so has its in= variable set to 1 just 3 times.  For the remaining 2 nonmatching observations, hard-coding the in= variables to 0 means that the 0 value remains in place.

 

The best way to view this might be to add 3 PUT statements:

 

put  from_d1= from_d2=;

 

Add this statement at 3 points in the DATA step:  (1) just after the DATA statement, (2) just before the MERGE statement, and (3) just after the BY statement.  That might help illustrate how the software sets, resets, and utilizes those variables.

deepanshub
Fluorite | Level 6

You are simply awesome. Thanks for explaining that to me in detail.

Vittal17
Calcite | Level 5

when do we use merge ?? any clear notes or video file...Tx...

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 3718 views
  • 3 likes
  • 6 in conversation