BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nickspencer
Obsidian | Level 7

Hi All,

 

I have a dataset with columns ID,  condition, date and others. I want to dedup the rows by ID and the earliest date but the issue I am having is if the condition value is 'NO', the other values of column condition should take precedence irrespective of the date.

 

ID            CONDITION    DATE

1234         NO                  3/1/2020

1234         A-1                  3/5/2020

1234         P-1                  3/2/2020

2345         NO                  3/5/2020

2345         NO                  3/1/2020 

 

The result should have:

ID            CONDITION    DATE

1234         P-1                  3/2/2020

2345         NO                   3/1/2020

 

Hope this makes sense.

 

Any suggestion is appreciated.

 

Thanks.

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @nickspencer,

 

Here's a simple solution:

data have;
input id condition $ date :mmddyy.;
format date mmddyy10.;
cards;
1234 NO 3/1/2020
1234 A-1 3/5/2020
1234 P-1 3/2/2020
2345 NO 3/5/2020
2345 NO 3/1/2020 
;

proc sql;
create view _tmp as
select * from have
order by id, condition='NO', date;
quit;

data want;
set _tmp;
by id;
if first.id;
run;

A more robust ORDER BY clause might be:

order by id, missing(date), condition='NO', date, condition;

This would avoid the selection of missing dates if possible (including the case that only condition='NO' occurs with non-missing dates). Moreover, in case of tied observations (same date) the alphabetical order of (not-'NO') conditions would serve as the tie-breaker.

 

If your dataset is very large and already sorted by ID, a different solution (without sorting) might be more efficient.

View solution in original post

1 REPLY 1
FreelanceReinh
Jade | Level 19

Hi @nickspencer,

 

Here's a simple solution:

data have;
input id condition $ date :mmddyy.;
format date mmddyy10.;
cards;
1234 NO 3/1/2020
1234 A-1 3/5/2020
1234 P-1 3/2/2020
2345 NO 3/5/2020
2345 NO 3/1/2020 
;

proc sql;
create view _tmp as
select * from have
order by id, condition='NO', date;
quit;

data want;
set _tmp;
by id;
if first.id;
run;

A more robust ORDER BY clause might be:

order by id, missing(date), condition='NO', date, condition;

This would avoid the selection of missing dates if possible (including the case that only condition='NO' occurs with non-missing dates). Moreover, in case of tied observations (same date) the alphabetical order of (not-'NO') conditions would serve as the tie-breaker.

 

If your dataset is very large and already sorted by ID, a different solution (without sorting) might be more efficient.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 740 views
  • 3 likes
  • 2 in conversation