SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Deleting duplicating dates

Accepted Solution Solved
Reply
Contributor
Posts: 45
Accepted Solution

Deleting duplicating dates

Hi all,

 

I've been trying to deal with an issue like this:

 

Dataset:

 

ROW    ID      DATE

1           1       20010112

2           1       20010212

3           2       20000430

4           2       20000514

5           2       20000514

6           2       20010112

7           3       20090807

 

In the above dataset, ID 2 has two rows with the same date 20000514. I want to remove both of thses two rows. I tried using lag function, which will remove the second of the two rows (row 5) but not the first row (row 4). I suspect I should simulate a lead function? But somehow it doesn't work. Below are my codes:

 

/*This is me trying to mark row 5 for deleting later*/

proc sort data=have; by id date; run;
data want1; set have;

     by id date;
     lagdate=lag(date);

     if lagdate=date then removal=1; else removal=0;
run;

 

/*This is me trying to mark row 4 for deleting later, but somehow it still marks row 5. */
proc sort data=want1; by id descending date; run;
data want; set want1;

     by id descending date;
     leaddate=lag(date);

     if leaddate=date then removal2=1; else removal2=0;
run;

 

Can anyone help me solve this issue please? Thanks in advance!

 

J


Accepted Solutions
Solution
‎04-21-2016 06:53 PM
Contributor
Posts: 45

Re: Deleting duplicating dates

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

View solution in original post


All Replies
Super User
Posts: 17,963

Re: Deleting duplicating dates

Look at proc sort with the UNIQUEOUT option.

 

Use BY ROW ID.

See page 3:

http://support.sas.com/resources/papers/proceedings13/324-2013.pdf

 

proc sort data=have nouniquekeys out=dups uniqueout=want;
by id row;
run;
Solution
‎04-21-2016 06:53 PM
Contributor
Posts: 45

Re: Deleting duplicating dates

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

Trusted Advisor
Posts: 1,115

Re: Deleting duplicating dates

[ Edited ]

Here is a variant of @Reeza's suggestion:

proc sort data=have out=_null_ nouniquekey uniqueout=want;
by id date;
run;

Alternatively, you could use a data step (after sorting dataset HAVE by id date):

data want;
set have;
by id date;
if first.date & last.date;
run;

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 307 views
  • 0 likes
  • 3 in conversation