DATA Step, Macro, Functions and more

Duplicates

Reply
N/A
Posts: 1

Duplicates

Hi,
I am a novice user of SAS and wondered if someone could help me.
I have two sets of data one with 'knowns' references and one with 'HP' which have some duplicate reference numbers when merged. I want to merge these two spreadsheets and remove the duplicates but I only want to remove the duplicates that came from the 'known' spreadsheet and not the 'HP' spreadsheet is this possible? Each spreadsheet does have a variable stating whether the reference is HP or known.

Many thanks in advance
Super User
Posts: 5,429

Re: Duplicates

If you could supply some sample input data and desired output, it would be easier to understand and help you out.

Regards,
Linus
Data never sleeps
Super Contributor
Super Contributor
Posts: 3,174

Re: Duplicates

In general terms, removing duplicates can be accomplished in SAS using various techniques, such as:

1) PROC SORT and NODUPLICATES with a BY statement (the entire record contents are compared for duplicate) -- important to note that the BY key must be granular enough so that the duplicates are adjacent located when sorted, so they can be removed. By the way, there is also a DUPOUT= keyword to re-direct the duplicate observations to an alternate file in the PROC SORT, as needed.

2) PROC SORT and NODUPKEY with a BY statement (the BY key variable values are compared for duplicate).

3) DATA step using a sorted input file with a SET statement and a BY variable list, and also use IF FIRST._some_by_variable_ or IF LAST._some_by_variable_ and possibly the NOT condition with FIRST. and LAST. to identify and remove your undesired (or desired) duplicate data condition.

Given what you have mentioned, it will be likely that you will want to use the IN= parameter(s) for your MERGE statement (along with your BY list) to identify what input SAS file contributed to the DATA step processing, set some additional SAS variable (remember, the IN= variables are not kept on the output SAS dataset in the DATA step), and then use this information, possibly in the current or a later DATA step to address your duplicate data condition.

For consideration, the SAS support website http://support.sas.com/ has a SEARCH facility and you will find SAS user community (SUGI / SGF) conference papers, as well as SAS-hosted product technical documentation on this topic. A Google advanced search will also reveal useful reference material -- here is an example search which is limited to the SAS.COM website:

+sort +merge +removing +duplicates site:sas.com


Scott Barry
SBBWorks, Inc.

SAS 9.2 SORT PROCEDURE Doc:
http://support.sas.com/documentation/cdl/en/proc/59565/HTML/default/a000146878.htm


Finding a Duplicate in a Haystack
Brett J. Peterson, Medtronic Inc., Minneapolis, MN
http://www2.sas.com/proceedings/sugi31/164-31.pdf
N/A
Posts: 1

Re: Duplicates

Thanks so much, the links are really useful!
Ask a Question
Discussion stats
  • 3 replies
  • 134 views
  • 0 likes
  • 4 in conversation