Hi All,
I have a dataset with columns ID, condition, date and others. I want to dedup the rows by ID and the earliest date but the issue I am having is if the condition value is 'NO', the other values of column condition should take precedence irrespective of the date.
ID CONDITION DATE
1234 NO 3/1/2020
1234 A-1 3/5/2020
1234 P-1 3/2/2020
2345 NO 3/5/2020
2345 NO 3/1/2020
The result should have:
ID CONDITION DATE
1234 P-1 3/2/2020
2345 NO 3/1/2020
Hope this makes sense.
Any suggestion is appreciated.
Thanks.
Hi @nickspencer,
Here's a simple solution:
data have;
input id condition $ date :mmddyy.;
format date mmddyy10.;
cards;
1234 NO 3/1/2020
1234 A-1 3/5/2020
1234 P-1 3/2/2020
2345 NO 3/5/2020
2345 NO 3/1/2020
;
proc sql;
create view _tmp as
select * from have
order by id, condition='NO', date;
quit;
data want;
set _tmp;
by id;
if first.id;
run;
A more robust ORDER BY clause might be:
order by id, missing(date), condition='NO', date, condition;
This would avoid the selection of missing dates if possible (including the case that only condition='NO' occurs with non-missing dates). Moreover, in case of tied observations (same date) the alphabetical order of (not-'NO') conditions would serve as the tie-breaker.
If your dataset is very large and already sorted by ID, a different solution (without sorting) might be more efficient.
Hi @nickspencer,
Here's a simple solution:
data have;
input id condition $ date :mmddyy.;
format date mmddyy10.;
cards;
1234 NO 3/1/2020
1234 A-1 3/5/2020
1234 P-1 3/2/2020
2345 NO 3/5/2020
2345 NO 3/1/2020
;
proc sql;
create view _tmp as
select * from have
order by id, condition='NO', date;
quit;
data want;
set _tmp;
by id;
if first.id;
run;
A more robust ORDER BY clause might be:
order by id, missing(date), condition='NO', date, condition;
This would avoid the selection of missing dates if possible (including the case that only condition='NO' occurs with non-missing dates). Moreover, in case of tied observations (same date) the alphabetical order of (not-'NO') conditions would serve as the tie-breaker.
If your dataset is very large and already sorted by ID, a different solution (without sorting) might be more efficient.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.