BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JOLSAS
Quartz | Level 8

Hi all,

 

I've been trying to deal with an issue like this:

 

Dataset:

 

ROW    ID      DATE

1           1       20010112

2           1       20010212

3           2       20000430

4           2       20000514

5           2       20000514

6           2       20010112

7           3       20090807

 

In the above dataset, ID 2 has two rows with the same date 20000514. I want to remove both of thses two rows. I tried using lag function, which will remove the second of the two rows (row 5) but not the first row (row 4). I suspect I should simulate a lead function? But somehow it doesn't work. Below are my codes:

 

/*This is me trying to mark row 5 for deleting later*/

proc sort data=have; by id date; run;
data want1; set have;

     by id date;
     lagdate=lag(date);

     if lagdate=date then removal=1; else removal=0;
run;

 

/*This is me trying to mark row 4 for deleting later, but somehow it still marks row 5. */
proc sort data=want1; by id descending date; run;
data want; set want1;

     by id descending date;
     leaddate=lag(date);

     if leaddate=date then removal2=1; else removal2=0;
run;

 

Can anyone help me solve this issue please? Thanks in advance!

 

J

1 ACCEPTED SOLUTION

Accepted Solutions
JOLSAS
Quartz | Level 8

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

View solution in original post

3 REPLIES 3
Reeza
Super User

Look at proc sort with the UNIQUEOUT option.

 

Use BY ROW ID.

See page 3:

http://support.sas.com/resources/papers/proceedings13/324-2013.pdf

 

proc sort data=have nouniquekeys out=dups uniqueout=want;
by id row;
run;
JOLSAS
Quartz | Level 8

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

FreelanceReinh
Jade | Level 19

Here is a variant of @Reeza's suggestion:

proc sort data=have out=_null_ nouniquekey uniqueout=want;
by id date;
run;

Alternatively, you could use a data step (after sorting dataset HAVE by id date):

data want;
set have;
by id date;
if first.date & last.date;
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1055 views
  • 0 likes
  • 3 in conversation