BookmarkSubscribeRSS Feed
bean1980
Calcite | Level 5
Hi,
I am a novice user of SAS and wondered if someone could help me.
I have two sets of data one with 'knowns' references and one with 'HP' which have some duplicate reference numbers when merged. I want to merge these two spreadsheets and remove the duplicates but I only want to remove the duplicates that came from the 'known' spreadsheet and not the 'HP' spreadsheet is this possible? Each spreadsheet does have a variable stating whether the reference is HP or known.

Many thanks in advance
3 REPLIES 3
LinusH
Tourmaline | Level 20
If you could supply some sample input data and desired output, it would be easier to understand and help you out.

Regards,
Linus
Data never sleeps
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
In general terms, removing duplicates can be accomplished in SAS using various techniques, such as:

1) PROC SORT and NODUPLICATES with a BY statement (the entire record contents are compared for duplicate) -- important to note that the BY key must be granular enough so that the duplicates are adjacent located when sorted, so they can be removed. By the way, there is also a DUPOUT= keyword to re-direct the duplicate observations to an alternate file in the PROC SORT, as needed.

2) PROC SORT and NODUPKEY with a BY statement (the BY key variable values are compared for duplicate).

3) DATA step using a sorted input file with a SET statement and a BY variable list, and also use IF FIRST._some_by_variable_ or IF LAST._some_by_variable_ and possibly the NOT condition with FIRST. and LAST. to identify and remove your undesired (or desired) duplicate data condition.

Given what you have mentioned, it will be likely that you will want to use the IN= parameter(s) for your MERGE statement (along with your BY list) to identify what input SAS file contributed to the DATA step processing, set some additional SAS variable (remember, the IN= variables are not kept on the output SAS dataset in the DATA step), and then use this information, possibly in the current or a later DATA step to address your duplicate data condition.

For consideration, the SAS support website http://support.sas.com/ has a SEARCH facility and you will find SAS user community (SUGI / SGF) conference papers, as well as SAS-hosted product technical documentation on this topic. A Google advanced search will also reveal useful reference material -- here is an example search which is limited to the SAS.COM website:

+sort +merge +removing +duplicates site:sas.com


Scott Barry
SBBWorks, Inc.

SAS 9.2 SORT PROCEDURE Doc:
http://support.sas.com/documentation/cdl/en/proc/59565/HTML/default/a000146878.htm


Finding a Duplicate in a Haystack
Brett J. Peterson, Medtronic Inc., Minneapolis, MN
http://www2.sas.com/proceedings/sugi31/164-31.pdf
bean1980_
Calcite | Level 5
Thanks so much, the links are really useful!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 612 views
  • 0 likes
  • 4 in conversation