BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sundari
Calcite | Level 5

Hello all

Got a query on how to remove dulicates while using SAS 4.3 EG

Any pointers would be of help.

Thanks

Sundari

1 ACCEPTED SOLUTION

Accepted Solutions
AndersBergquist
Quartz | Level 8

You can also your the sorttask and select no dublicated keys or obs in the option panel.

View solution in original post

8 REPLIES 8
GeoffreyBrent
Calcite | Level 5

Assuming you want to remove duplicate observations with the same values?

data have;

input x y;

cards;

1 2

1 4

1 8

2 3

2 4

2 4

2 5

1 7

1 8

;

run;

/* PROC SORT method */

proc sort data=have nodup out=want1;

by x y;

quit;

/* PROC SQL method */

proc sql;

create table want2 as select distinct * from have;

quit;

If you want to delete duplicates that are not 100% identical (e.g. all units with the same ID variable regardless of whether they differ in other values) look at the NODUPKEY option in PROC SORT.

Patrick
Opal | Level 21

Use the query builder, select all columns, tick box "distinct rows"

AndersBergquist
Quartz | Level 8

You can also your the sorttask and select no dublicated keys or obs in the option panel.

Patrick
Opal | Level 21

Don't use the sort task to get rid of duplicate records (=all variable values equal). This will only give you the expected result if you also sort by ALL variables in the data set (and then you could also choose "Keep only the first record per by group").

Selecting option "Keep only one of each record..." in the sort task creates a Proc Sort statement using NODUP (which is NODUPRECS). This option is imho one of the less lucky implementations SAS did. It's quite deceptive and I've seen already a lot of people stumbling over it.

What NODUP does is removing identical rows (all variable values the same) only if they follow right one after the other - and you have no control if this happens this way if you don't sort by all variables (and then NODUPKEY gives you the same result).

Here an example to showcase the issue

data sample;

  input key var $;

  datalines;

1 a

1 b

1 b

1 a

;

run;

Capture1.PNG

Capture.PNG

This generates the following Proc Sort code (in EG4.3):

PROC SORT DATA=WORK.SAMPLE
   OUT=WORK.SORTSortedSAMPLE(LABEL="Sorted WORK.SAMPLE")
   NODUP
   ;
   BY key;

RUN;

One would now expect to get 2 rows as result

1 a

1 b

BUT that's what you will get:

Capture.PNG

Selection of "Maintaining original data order..." will also influence the result - but with the simple sample data provided the result will be wrong as well.

Conclusion

Use the query builder with "select distinct rows only" for de-dupping.

GeoffreyBrent
Calcite | Level 5

Patrick is correct, I was aware of this rather important detail but completely neglected to mention it in my response 😞 In hindsight, my PROC SORT example should have used "by _all_".

sundari
Calcite | Level 5

Thank you guys...

EmilyC
Calcite | Level 5

I work with pharmacy claims, in which we often want to have a final report with only the most current fill reported.  We remove the duplicate claims, that have some different data and some matching, through using group filters in EG.   We also are non-coders. 

The process is to pull the full data set first.  Then query off it, for our example, we first select to group by the Member ID, and GPI (unique drug identifier) - therefore keeping the combination of member + drug, regardless of the other information on the claim )such as costs, pharmacy, etc) and then we use the MAX summary function on the date field. 

Once you have selected to do the MAX of date, on the filter tab, the "filter the summarized data" option will show up at the bottom half of the screen.   We then enter in the filter to this section where (CALCULATED MAX DATE) = DATE.

This is probably a bit confusing to read through, but I think it's a really good way to get to the most recent data out of millions of rows of multiple transactions.  I would assume this would be helpful further than the Pharmacy industry.

9-19-2012 2-59-41 PM.png

ChrisHemedinger
Community Manager

By chance, I just added a blog post with a simplified example of what EmilyC describes.

HAVING (clause) fun with SAS Enterprise Guide - The SAS Dummy

Chris

Register for SAS Innovate 2025!! The premier event for SAS users, May 6-9 in Orlando FL. Sign up now for the best deals!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 95493 views
  • 6 likes
  • 6 in conversation