BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JOLSAS
Quartz | Level 8

Hi all,

 

I've been trying to deal with an issue like this:

 

Dataset:

 

ROW    ID      DATE

1           1       20010112

2           1       20010212

3           2       20000430

4           2       20000514

5           2       20000514

6           2       20010112

7           3       20090807

 

In the above dataset, ID 2 has two rows with the same date 20000514. I want to remove both of thses two rows. I tried using lag function, which will remove the second of the two rows (row 5) but not the first row (row 4). I suspect I should simulate a lead function? But somehow it doesn't work. Below are my codes:

 

/*This is me trying to mark row 5 for deleting later*/

proc sort data=have; by id date; run;
data want1; set have;

     by id date;
     lagdate=lag(date);

     if lagdate=date then removal=1; else removal=0;
run;

 

/*This is me trying to mark row 4 for deleting later, but somehow it still marks row 5. */
proc sort data=want1; by id descending date; run;
data want; set want1;

     by id descending date;
     leaddate=lag(date);

     if leaddate=date then removal2=1; else removal2=0;
run;

 

Can anyone help me solve this issue please? Thanks in advance!

 

J

1 ACCEPTED SOLUTION

Accepted Solutions
JOLSAS
Quartz | Level 8

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

View solution in original post

3 REPLIES 3
Reeza
Super User

Look at proc sort with the UNIQUEOUT option.

 

Use BY ROW ID.

See page 3:

http://support.sas.com/resources/papers/proceedings13/324-2013.pdf

 

proc sort data=have nouniquekeys out=dups uniqueout=want;
by id row;
run;
JOLSAS
Quartz | Level 8

Reeza,

 

sql does work. I identified all the id-date pairs that have multiple dates using the code below:

 

proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;

 

Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!

FreelanceReinh
Jade | Level 19

Here is a variant of @Reeza's suggestion:

proc sort data=have out=_null_ nouniquekey uniqueout=want;
by id date;
run;

Alternatively, you could use a data step (after sorting dataset HAVE by id date):

data want;
set have;
by id date;
if first.date & last.date;
run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1012 views
  • 0 likes
  • 3 in conversation