Solved: Deleting duplicating dates

JOLSAS · Posted 04-21-2016 05:04 PM

Hi all,

I've been trying to deal with an issue like this:

Dataset:

ROW ID DATE

1 1 20010112

2 1 20010212

3 2 20000430

4 2 20000514

5 2 20000514

6 2 20010112

7 3 20090807

In the above dataset, ID 2 has two rows with the same date 20000514. I want to remove both of thses two rows. I tried using lag function, which will remove the second of the two rows (row 5) but not the first row (row 4). I suspect I should simulate a lead function? But somehow it doesn't work. Below are my codes:

/*This is me trying to mark row 5 for deleting later*/

proc sort data=have; by id date; run;
data want1; set have;

by id date;
lagdate=lag(date);

if lagdate=date then removal=1; else removal=0;
run;

/*This is me trying to mark row 4 for deleting later, but somehow it still marks row 5. */
proc sort data=want1; by id descending date; run;
data want; set want1;

by id descending date;
leaddate=lag(date);

if leaddate=date then removal2=1; else removal2=0;
run;

Can anyone help me solve this issue please? Thanks in advance!

J

JOLSAS · Posted 04-21-2016 06:50 PM

Reeza,

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

View solution in original post

Reeza · Posted 04-21-2016 05:55 PM

Look at proc sort with the UNIQUEOUT option.

Use BY ROW ID.

See page 3:

http://support.sas.com/resources/papers/proceedings13/324-2013.pdf

proc sort data=have nouniquekeys out=dups uniqueout=want;
by id row;
run;

JOLSAS · Posted 04-21-2016 06:50 PM

Reeza,

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

FreelanceReinh · Posted 04-21-2016 06:18 PM

Here is a variant of @Reeza's suggestion:

proc sort data=have out=_null_ nouniquekey uniqueout=want;
by id date;
run;

Alternatively, you could use a data step (after sorting dataset HAVE by id date):

data want;
set have;
by id date;
if first.date & last.date;
run;

Deleting duplicating dates

Re: Deleting duplicating dates

Re: Deleting duplicating dates

Re: Deleting duplicating dates

Re: Deleting duplicating dates

Deleting duplicating dates

Re: Deleting duplicating dates

Re: Deleting duplicating dates

Re: Deleting duplicating dates

Re: Deleting duplicating dates

SAS Innovate 2025: Call for Content