BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JOLSAS
Quartz | Level 8

Hi all,

 

I've been trying to deal with an issue like this:

 

Dataset:

 

ROW    ID      DATE

1           1       20010112

2           1       20010212

3           2       20000430

4           2       20000514

5           2       20000514

6           2       20010112

7           3       20090807

 

In the above dataset, ID 2 has two rows with the same date 20000514. I want to remove both of thses two rows. I tried using lag function, which will remove the second of the two rows (row 5) but not the first row (row 4). I suspect I should simulate a lead function? But somehow it doesn't work. Below are my codes:

 

/*This is me trying to mark row 5 for deleting later*/

proc sort data=have; by id date; run;
data want1; set have;

     by id date;
     lagdate=lag(date);

     if lagdate=date then removal=1; else removal=0;
run;

 

/*This is me trying to mark row 4 for deleting later, but somehow it still marks row 5. */
proc sort data=want1; by id descending date; run;
data want; set want1;

     by id descending date;
     leaddate=lag(date);

     if leaddate=date then removal2=1; else removal2=0;
run;

 

Can anyone help me solve this issue please? Thanks in advance!

 

J

1 ACCEPTED SOLUTION

Accepted Solutions
JOLSAS
Quartz | Level 8

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

View solution in original post

3 REPLIES 3
Reeza
Super User

Look at proc sort with the UNIQUEOUT option.

 

Use BY ROW ID.

See page 3:

http://support.sas.com/resources/papers/proceedings13/324-2013.pdf

 

proc sort data=have nouniquekeys out=dups uniqueout=want;
by id row;
run;
JOLSAS
Quartz | Level 8

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

FreelanceReinh
Jade | Level 19

Here is a variant of @Reeza's suggestion:

proc sort data=have out=_null_ nouniquekey uniqueout=want;
by id date;
run;

Alternatively, you could use a data step (after sorting dataset HAVE by id date):

data want;
set have;
by id date;
if first.date & last.date;
run;

 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1386 views
  • 0 likes
  • 3 in conversation