BookmarkSubscribeRSS Feed
Vish33
Lapis Lazuli | Level 10

Hi,

I have a data set with millions of records and i want to remove the duplicates from that and place them into a new data set.

I need different scenarios like using proc sort and other techniques.

Thanks in advance,

vishnu

6 REPLIES 6
art297
Opal | Level 21

Depends upon what you want to remove and if you want to have total control over what is removed.

My own preference is to use proc sort and then, in a datastep, take advantage of the first. and last. boolean variables.

FriedEgg
SAS Employee

Look at the dupout option in proc sort to see if it meets your needs, it does not offer the same level of control as Art's recommendation.

http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146878.htm

ArtC
Rhodochrosite | Level 12

Remember if you use SORT with NODUPLICATES (or SORT followed by FIRST. and LAST.) that you must have a sufficient key to bring the duplicate records next to each other or they will not be eliminated. This paper has been around a while but is still important today:  http://www2.sas.com/proceedings/sugi25/25/po/25p221.pdf

art297
Opal | Level 21

Art,

I would add that there aren't many (if any) use(s) I've ever seen for the noduplicates option.  If I were to use the dupout option, I would always use it with NODUPKEY

Vish33
Lapis Lazuli | Level 10

Thanks Art,

this was really helpful..

Hima
Obsidian | Level 7

Please try this code:

data person;
   input id $ name $ dept $;
   datalines;
1 John Sales
2 Mary Acctng
3 Tom  Marketing
1 John Sales
3 Tom  Marketing
;

PROC SORT DATA = person; BY id;
RUN;

PROC SQL;

CREATE TABLE TEST AS
SELECT DISTINCT * FROM person GROUP BY ID HAVING COUNT(ID) GT 1;
QUIT;

PROC SORT DATA = person NODUPKEY; BY id;
RUN;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 986 views
  • 6 likes
  • 5 in conversation