DATA Step, Macro, Functions and more

How do I Avoid Duplicates in Merge

Reply
Frequent Contributor
Posts: 79

How do I Avoid Duplicates in Merge

[ Edited ]

I have two tables 

 

TableA

 

key var1 var2 var3 count1 count2
123456789 usa hi 96713 1 1
784569102 usa ny 11108 1  
635472134 usa dc 78456 1 1

 

TableB

 

key Key2 var4
123456789 1 Processed
123456789 2 NP
784569102 1 Failed
635472134 1 Processed
635472134 2 Failed

 

I am trying to merge them using the following code 

 

PROC SORT DATA=TABLEA;
BY KEY;
RUN;

PROC SORT DATA=TABLEB;
BY KEY;
RUN;

DATA COMBINED;
MERGE TABLEA(IN=A)    TABLEB(IN=B);
BY KEY;
IF A AND B;
RUN;

The output looks like this : -

 

key var1 var2 var3 count1 count2 key Key2 var4
123456789 usa hi 96713 1 1 123456789 1 Processed
123456789 usa hi 96713 1 1 123456789 2 NP
784569102 usa ny 11108 1   784569102 1 Failed
635472134 usa dc 78456 1 1 635472134 1 Processed
635472134 usa dc 78456 1 1 635472134 2 Failed

 

I do not want my TableA colums to be duplicated in the result and only the TableB colums. The reason is I will be using count1 and count2 in later step to summarize. so effectively I want the resultant table to look like this 

 

key var1 var2 var3 count1 count2 key Key2 var4
123456789 usa hi 96713 1 1 123456789 1 Processed
            123456789 2 NP
784569102 usa ny 11108 1   784569102 1 Failed
635472134 usa dc 78456 1 1 635472134 1 Processed
            635472134 2 Failed

 

I can do this by looking at the duplicates at a later step and make the TableA columns to missing (call missing). Is there a better approach to arrive this solution. 

 

Thanks !

Super User
Posts: 23,740

Re: How do I Avoid Duplicates in Merge

If you have duplicates which record do you want to be joined with, what are the rules that determine which one you’re keeping?
Super User
Posts: 23,740

Re: How do I Avoid Duplicates in Merge

It looks like you want a join, but then a report procedure to display the data? Storing data in that manner is a bit weird, but it’s very common for reports not to duplicate data. I believe both PROC REPORT or PRINT have methods/options to prevent duplicates. 

Frequent Contributor
Posts: 79

Re: How do I Avoid Duplicates in Merge

1. TableA unique by key and TableB has duplicates 

 

2. What functionality in PROC REPORT and PRINT would allow me to summarize without including duplicates. 

Super User
Posts: 6,778

Re: How do I Avoid Duplicates in Merge

Here's the general idea.  You may want to fiddle with it of course:

 

proc sort data=combined;

   by key var1 var2 var3 count1 count2;

run;

 

proc print data=combined;

   by key var1 var2 var3 count1 count2;

   id key var1 var2 var3 count1 count2;

   var key2 var4;

run;

 

If you actually want KEY to print twice, you have to switch to PROC REPORT.  For PROC PRINT to print KEY twice you would need a DATA step to copy KEY into a second variable and print both variables.

Ask a Question
Discussion stats
  • 4 replies
  • 124 views
  • 0 likes
  • 3 in conversation