Hi all,
I've been trying to deal with an issue like this:
Dataset:
ROW ID DATE
1 1 20010112
2 1 20010212
3 2 20000430
4 2 20000514
5 2 20000514
6 2 20010112
7 3 20090807
In the above dataset, ID 2 has two rows with the same date 20000514. I want to remove both of thses two rows. I tried using lag function, which will remove the second of the two rows (row 5) but not the first row (row 4). I suspect I should simulate a lead function? But somehow it doesn't work. Below are my codes:
/*This is me trying to mark row 5 for deleting later*/
proc sort data=have; by id date; run;
data want1; set have;
by id date;
lagdate=lag(date);
if lagdate=date then removal=1; else removal=0;
run;
/*This is me trying to mark row 4 for deleting later, but somehow it still marks row 5. */
proc sort data=want1; by id descending date; run;
data want; set want1;
by id descending date;
leaddate=lag(date);
if leaddate=date then removal2=1; else removal2=0;
run;
Can anyone help me solve this issue please? Thanks in advance!
J
Reeza,
sql does work. I identified all the id-date pairs that have multiple dates using the code below:
proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;
Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!
Look at proc sort with the UNIQUEOUT option.
Use BY ROW ID.
See page 3:
http://support.sas.com/resources/papers/proceedings13/324-2013.pdf
proc sort data=have nouniquekeys out=dups uniqueout=want;
by id row;
run;
Reeza,
sql does work. I identified all the id-date pairs that have multiple dates using the code below:
proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;
Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!
Here is a variant of @Reeza's suggestion:
proc sort data=have out=_null_ nouniquekey uniqueout=want;
by id date;
run;
Alternatively, you could use a data step (after sorting dataset HAVE by id date):
data want;
set have;
by id date;
if first.date & last.date;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.