Help using Base SAS procedures

re: Duplicates

Accepted Solution Solved
Reply
Regular Contributor
Posts: 222
Accepted Solution

re: Duplicates

Hi, I have created a dataset from two different datasets and I want to remove any duplicates in the new datasets. I have tried using the the proc sql distinct approach as well as a Data step with the first. approach and I keep getting a dataset that has duplicates but some of the duplicates are removed.. The dataset has only 3 variable, ID as a character variable and Unit Price and Effective Date both as numeric variables. I double checked and made sure are and were formatted the same before creating the new dataset. Any suggestion on what I may be missing and should check....Thanks in Avance.


Accepted Solutions
Solution
‎04-15-2015 03:54 PM
Super User
Posts: 10,500

Re: re: Duplicates

I would be interested is seeing the SQL code that did not work. Order of when to appear DISTINCT may have been the issue.

Also you may need to look at the ID in case some of them have one or more leading blank characters as "string" is not equal to " string" and could cause apparent duplicates.

View solution in original post


All Replies
Super User
Posts: 17,826

Re: re: Duplicates

Your join may have issues. 

Otherwise you can check unique/duplicate records in SAS with NOUNIQUEKEY in proc sort. Available in SAS 9.3+

proc sort data=have out=duprec nouniquekey uniqueout=want;
by ID;
run;

Solution
‎04-15-2015 03:54 PM
Super User
Posts: 10,500

Re: re: Duplicates

I would be interested is seeing the SQL code that did not work. Order of when to appear DISTINCT may have been the issue.

Also you may need to look at the ID in case some of them have one or more leading blank characters as "string" is not equal to " string" and could cause apparent duplicates.

Regular Contributor
Posts: 222

Re: re: Duplicates


Hi Reeza & Ballardw,

Thanks for your help and suggestions. With your suggestions of the possibility of blank characters for the ID variable, it turns out that there must have been a blank or blanks in the Unit Price as I used the Compress Function on both the ID and Unit Price variables having to convert Unit Price to a character variable first. It did the trick. Thanks once againn for your help.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 272 views
  • 3 likes
  • 3 in conversation