Re: How do I Avoid Duplicates in Merge

dhana · Posted 05-07-2018 08:32 PM

I have two tables

TableA

key	var1	var2	var3	count1	count2
123456789	usa	hi	96713	1	1
784569102	usa	ny	11108	1
635472134	usa	dc	78456	1	1

TableB

key	Key2	var4
123456789	1	Processed
123456789	2	NP
784569102	1	Failed
635472134	1	Processed
635472134	2	Failed

I am trying to merge them using the following code

PROC SORT DATA=TABLEA;
BY KEY;
RUN;

PROC SORT DATA=TABLEB;
BY KEY;
RUN;

DATA COMBINED;
MERGE TABLEA(IN=A)    TABLEB(IN=B);
BY KEY;
IF A AND B;
RUN;

The output looks like this : -

key	var1	var2	var3	count1	count2	key	Key2	var4
123456789	usa	hi	96713	1	1	123456789	1	Processed
123456789	usa	hi	96713	1	1	123456789	2	NP
784569102	usa	ny	11108	1		784569102	1	Failed
635472134	usa	dc	78456	1	1	635472134	1	Processed
635472134	usa	dc	78456	1	1	635472134	2	Failed

I do not want my TableA colums to be duplicated in the result and only the TableB colums. The reason is I will be using count1 and count2 in later step to summarize. so effectively I want the resultant table to look like this

key	var1	var2	var3	count1	count2	key	Key2	var4
123456789	usa	hi	96713	1	1	123456789	1	Processed
						123456789	2	NP
784569102	usa	ny	11108	1		784569102	1	Failed
635472134	usa	dc	78456	1	1	635472134	1	Processed
						635472134	2	Failed

I can do this by looking at the duplicates at a later step and make the TableA columns to missing (call missing). Is there a better approach to arrive this solution.

Thanks !

Reeza · Posted 05-07-2018 08:34 PM

If you have duplicates which record do you want to be joined with, what are the rules that determine which one you’re keeping?

Reeza · Posted 05-07-2018 08:36 PM

It looks like you want a join, but then a report procedure to display the data? Storing data in that manner is a bit weird, but it’s very common for reports not to duplicate data. I believe both PROC REPORT or PRINT have methods/options to prevent duplicates.

dhana · Posted 05-07-2018 08:38 PM

1. TableA unique by key and TableB has duplicates

2. What functionality in PROC REPORT and PRINT would allow me to summarize without including duplicates.

Astounding · Posted 05-07-2018 09:50 PM

Here's the general idea. You may want to fiddle with it of course:

proc sort data=combined;

by key var1 var2 var3 count1 count2;

run;

proc print data=combined;

by key var1 var2 var3 count1 count2;

id key var1 var2 var3 count1 count2;

var key2 var4;

run;

If you actually want KEY to print twice, you have to switch to PROC REPORT. For PROC PRINT to print KEY twice you would need a DATA step to copy KEY into a second variable and print both variables.

How do I Avoid Duplicates in Merge