The structure of the data I have is like below:
id | t_date | h_date | d_date | note |
1 | 8/30/2020 | . | . | |
1 | 8/30/2020 | . | . | keep any one row |
2 | 9/21/2021 | |||
2 | 9/21/2021 | 10/15/2021 | . | Retain only this one |
2 | 10/27/2021 | . | . |
this row |
3 | 12/11/2021 | . | . | remove this row |
3 | 12/11/2021 | 12/20/2021 | 1/1/2022 | retain this row |
4 | 2/5/2022 | . | . | |
4 | 3/19/2022 | . | . |
I want to de-duplicate if the rows have same ID, t_date and h_date, d_date is null. But if the rows have same ID & t_date, but either h_date or d_date is not null, I want to keep this row. I want to remove the row with the same ID having both h_date and d_date as null.
What I want is this structure below:
id | t_date | h_date | d_date |
1 | 8/30/2020 | . | . |
2 | 9/21/2021 | 10/15/2021 | . |
2 | 10/27/2021 | . | . |
3 | 12/11/2021 | 12/20/2021 | 1/1/2022 |
4 | 2/5/2022 | . | . |
4 | 3/19/2022 | . | . |
Below could work. Code not tested because you didn't provide scripts that create the sample data.
proc sort data=have;
by id t_date h_date d_date;
run;
data want;
set have;
by id t_date h_date d_date;
if last.t_date;
run;
Below could work. Code not tested because you didn't provide scripts that create the sample data.
proc sort data=have;
by id t_date h_date d_date;
run;
data want;
set have;
by id t_date h_date d_date;
if last.t_date;
run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.