Hi all,
I've been trying to deal with an issue like this:
Dataset:
ROW ID DATE
1 1 20010112
2 1 20010212
3 2 20000430
4 2 20000514
5 2 20000514
6 2 20010112
7 3 20090807
In the above dataset, ID 2 has two rows with the same date 20000514. I want to remove both of thses two rows. I tried using lag function, which will remove the second of the two rows (row 5) but not the first row (row 4). I suspect I should simulate a lead function? But somehow it doesn't work. Below are my codes:
/*This is me trying to mark row 5 for deleting later*/
proc sort data=have; by id date; run;
data want1; set have;
by id date;
lagdate=lag(date);
if lagdate=date then removal=1; else removal=0;
run;
/*This is me trying to mark row 4 for deleting later, but somehow it still marks row 5. */
proc sort data=want1; by id descending date; run;
data want; set want1;
by id descending date;
leaddate=lag(date);
if leaddate=date then removal2=1; else removal2=0;
run;
Can anyone help me solve this issue please? Thanks in advance!
J
Reeza,
sql does work. I identified all the id-date pairs that have multiple dates using the code below:
proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;
Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!
Look at proc sort with the UNIQUEOUT option.
Use BY ROW ID.
See page 3:
http://support.sas.com/resources/papers/proceedings13/324-2013.pdf
proc sort data=have nouniquekeys out=dups uniqueout=want;
by id row;
run;
Reeza,
sql does work. I identified all the id-date pairs that have multiple dates using the code below:
proc sql;
create table dup as select distinct id, date, count(date) as count from have group by id, date order by id, date, count;
quit;
Later on I mark those that have count>1 and merge with the original table. I was able to delete them. Thanks for help!
Here is a variant of @Reeza's suggestion:
proc sort data=have out=_null_ nouniquekey uniqueout=want;
by id date;
run;
Alternatively, you could use a data step (after sorting dataset HAVE by id date):
data want;
set have;
by id date;
if first.date & last.date;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.