The structure of the data I have is like below:
id | t_date | h_date | d_date | note |
1 | 8/30/2020 | . | . | |
1 | 8/30/2020 | . | . | keep any one row |
2 | 9/21/2021 | |||
2 | 9/21/2021 | 10/15/2021 | . | Retain only this one |
2 | 10/27/2021 | . | . |
this row |
3 | 12/11/2021 | . | . | remove this row |
3 | 12/11/2021 | 12/20/2021 | 1/1/2022 | retain this row |
4 | 2/5/2022 | . | . | |
4 | 3/19/2022 | . | . |
I want to de-duplicate if the rows have same ID, t_date and h_date, d_date is null. But if the rows have same ID & t_date, but either h_date or d_date is not null, I want to keep this row. I want to remove the row with the same ID having both h_date and d_date as null.
What I want is this structure below:
id | t_date | h_date | d_date |
1 | 8/30/2020 | . | . |
2 | 9/21/2021 | 10/15/2021 | . |
2 | 10/27/2021 | . | . |
3 | 12/11/2021 | 12/20/2021 | 1/1/2022 |
4 | 2/5/2022 | . | . |
4 | 3/19/2022 | . | . |
Below could work. Code not tested because you didn't provide scripts that create the sample data.
proc sort data=have;
by id t_date h_date d_date;
run;
data want;
set have;
by id t_date h_date d_date;
if last.t_date;
run;
Below could work. Code not tested because you didn't provide scripts that create the sample data.
proc sort data=have;
by id t_date h_date d_date;
run;
data want;
set have;
by id t_date h_date d_date;
if last.t_date;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.