SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Why duplicate rows after merging two data sets

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Why duplicate rows after merging two data sets

Hello,

I'll truly appreciate if somebody helps me out.

I wonder why I get a data including duplicate rows after merging.

My code is following.

proc sort data=work.A;                /**table A has 500 rows**/

     by store product week;

proc sort data=work.B;                /**table B has 800 rows**/

     by store product week;

data work.A;

     merge work.A (in=in1) work.B (in=in2 keep=store product week advertise);

     by store product week;

     if in1;

run;


This code is merging table A and B in order to add "advertise" column in the table A.

I expected the number of table A's rows is 500 since the code has "if in1;". However, it was 800. Does anyone know why?


Jason


Accepted Solutions
Solution
‎03-04-2015 03:54 AM
Super User
Posts: 7,866

Re: Why duplicate rows after merging two data sets

You get an observation for each matching obs. As long as every obs in work.B has a match in work.A, they will be included in the output set (because the condition if in1 is always met)

If you do not want multiple observations, you need to eliminate the duplicates in Work.B first. Or include a first. or last. condition.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎03-04-2015 03:54 AM
Super User
Posts: 7,866

Re: Why duplicate rows after merging two data sets

You get an observation for each matching obs. As long as every obs in work.B has a match in work.A, they will be included in the output set (because the condition if in1 is always met)

If you do not want multiple observations, you need to eliminate the duplicates in Work.B first. Or include a first. or last. condition.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 81

Re: Why duplicate rows after merging two data sets


Hi Sarang,

You need to remove the duplicates using "nodupkey" option while you sort,  this would eliminate your issue.

You can try the below code example:

data one;
input store $ product $ week;
cards;
1 a 23
2 b 45
3 a 21
1 a 34
1 a 23
2 b 45
;
run;

data two;
input store $ product $ week advertise $;
cards;
1 a 23 y
2 b 45 y
1 a 34 n
2 b 45 y
3 a 21 y
5 c 45 y
7 d 76 y
;
run;


proc sort data=one nodupkey;
by store product week;
run;

proc sort data=two nodupkey;
by store product week;
run;

data final;
merge one(in=a) two(in=b keep=store product week advertise);
by store product week;
if a;
run;

Output:

Capture.JPG

Hope this helps!!

Thanks,

Daman

PROC Star
Posts: 1,167

Re: Why duplicate rows after merging two data sets

Run this, and take a look at the results. I think this will clarify how the MERGE works.

data a;

input week;

cards;

1

2

3

4

run;

data b;

input week advertise;

cards;

1 9

2 9

2 9

2 9

3 9

4 9

run;

data c;

merge a(in=in1) b;

  by week;

if in1;

run;

New Contributor
Posts: 4

Re: Why duplicate rows after merging two data sets

I really appreciate your answers!

They clarified my misunderstanding of merging.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 3663 views
  • 8 likes
  • 4 in conversation