Hi,
I have a data set with millions of records and i want to remove the duplicates from that and place them into a new data set.
I need different scenarios like using proc sort and other techniques.
Thanks in advance,
vishnu
Depends upon what you want to remove and if you want to have total control over what is removed.
My own preference is to use proc sort and then, in a datastep, take advantage of the first. and last. boolean variables.
Look at the dupout option in proc sort to see if it meets your needs, it does not offer the same level of control as Art's recommendation.
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146878.htm
Remember if you use SORT with NODUPLICATES (or SORT followed by FIRST. and LAST.) that you must have a sufficient key to bring the duplicate records next to each other or they will not be eliminated. This paper has been around a while but is still important today: http://www2.sas.com/proceedings/sugi25/25/po/25p221.pdf
Art,
I would add that there aren't many (if any) use(s) I've ever seen for the noduplicates option. If I were to use the dupout option, I would always use it with NODUPKEY
Thanks Art,
this was really helpful..
Please try this code:
data person;
input id $ name $ dept $;
datalines;
1 John Sales
2 Mary Acctng
3 Tom Marketing
1 John Sales
3 Tom Marketing
;
PROC SORT DATA = person; BY id;
RUN;
PROC SQL;
CREATE TABLE TEST AS
SELECT DISTINCT * FROM person GROUP BY ID HAVING COUNT(ID) GT 1;
QUIT;
PROC SORT DATA = person NODUPKEY; BY id;
RUN;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.