DATA Step, Macro, Functions and more

Eliminate not exact duplicate rows

Reply
Occasional Contributor
Posts: 17

Eliminate not exact duplicate rows

Hi, I am new to sas and I would like some help. I need to delete data that's outdated for the same user. I wanted to keep the record that the user is registered/most recent data. below is a sample data

Userid      UserName      Registered      effective_date      expiration_date    

123          Mary Jane          Y               1/1/2009               1/1/2014    

123          Mary Jane          N               1/1/2008               1/1/2014

234          John Doe           Y                3/21/2010             1/1/2015

245          Jane Doe           N                4/1/2007               1/1/2014

653          John Smith        N                8/5/2010               1/1/2014

653          John Smith        Y                9/14/2011              1/1/2014

900          Mike Adams       N               1/1/2012                 1/1/2015

This is the desired results

Userid      UserName      Registered      effective_date      expiration_date    

123          Mary Jane          Y               1/1/2009               1/1/2014    

234          John Doe           Y                3/21/2010             1/1/2015

245          Jane Doe           N                4/1/2007               1/1/2014

653          John Smith        Y                9/14/2011              1/1/2014

900          Mike Adams       N               1/1/2012                 1/1/2015

Thanks.

Super User
Posts: 5,260

Re: Eliminate not exact duplicate rows

Step one: sort by userid and descending expiration_date

Step two: proc sort nodupkey by userid

Data never sleeps
Super Contributor
Posts: 333

Re: Eliminate not exact duplicate rows

Just to clarify further since you are new to SAS and I think effective_date may be what you want in the first sort:

/*Sort the data in the right order */

Proc sort data=have ; by userid descending effective_date; run;

/*Allows proc sort to eliminate the duplicates and keeps the first row it finds (which is why you needed to sort it to begin with)*/

/** this sorts the above data and creates two new datasets: one with the deduped results (have2) and one with the rows that were removed (dups)**/

proc sort data=have out=have2 dup=dups nodupkey; by userid; run;

Hope this helps!

EJ

Super Contributor
Posts: 297

Re: Eliminate not exact duplicate rows

To keep it simple, based on the data you have provided try the following:

DATA HAVE;

LENGTH USERNAME $10;

INFILE DATALINES DLM=',';

INPUT USERID USERNAME $ REGISTERED $ EFFECTIVE_DATE:MMDDYY8. EXPIRATION_DATE:MMDDYY8.;

FORMAT EFFECTIVE_DATE EXPIRATION_DATE DATE9.;

DATALINES;

123,MARY JANE,Y,1/1/2009,1/1/2014

123,MARY JANE,N,1/1/2008,1/1/2014

234,JOHN DOE,Y,3/21/2010,1/1/2015

245,JANE DOE,N,4/1/2007,1/1/2014

653,JOHN SMITH,N,8/5/2010,1/1/2014

653,JOHN SMITH,Y,9/14/2011,1/1/2014

900,MIKE ADAMS,N,1/1/2012,1/1/2015

;

RUN;

PROC SORT DATA=HAVE OUT=SORTHAVE;

BY USERID EFFECTIVE_DATE;

RUN;

DATA WANT;

SET SORTHAVE;

BY USERID EFFECTIVE_DATE;

IF LAST.USERID;

RUN;

Occasional Contributor
Posts: 17

Re: Eliminate not exact duplicate rows

Thank you all for your help!

Valued Guide
Posts: 765

Re: Eliminate not exact duplicate rows

hi ... though you are a beginner, you can venture into PROC SQL ...

proc sql;

create table new as

select * from old

group userid

having effective_date = max(effective_date);

quit;

Super Contributor
Posts: 1,636

Re: Eliminate not exact duplicate rows

Should be group by userid

Valued Guide
Posts: 765

Re: Eliminate not exact duplicate rows

hi ... GROUP works without the BY (as does ORDER)

I suppose it might not be good practice to leave out the BY

Super Contributor
Posts: 1,636

Re: Eliminate not exact duplicate rows

Hi Mike,

It is good to know. Thank you!

Ask a Question
Discussion stats
  • 8 replies
  • 674 views
  • 12 likes
  • 6 in conversation