The structure of the data I have is like below:
| id | t_date | h_date | d_date | note |
| 1 | 8/30/2020 | . | . | |
| 1 | 8/30/2020 | . | . | keep any one row |
| 2 | 9/21/2021 | |||
| 2 | 9/21/2021 | 10/15/2021 | . | Retain only this one |
| 2 | 10/27/2021 | . | . |
this row |
| 3 | 12/11/2021 | . | . | remove this row |
| 3 | 12/11/2021 | 12/20/2021 | 1/1/2022 | retain this row |
| 4 | 2/5/2022 | . | . | |
| 4 | 3/19/2022 | . | . |
I want to de-duplicate if the rows have same ID, t_date and h_date, d_date is null. But if the rows have same ID & t_date, but either h_date or d_date is not null, I want to keep this row. I want to remove the row with the same ID having both h_date and d_date as null.
What I want is this structure below:
| id | t_date | h_date | d_date |
| 1 | 8/30/2020 | . | . |
| 2 | 9/21/2021 | 10/15/2021 | . |
| 2 | 10/27/2021 | . | . |
| 3 | 12/11/2021 | 12/20/2021 | 1/1/2022 |
| 4 | 2/5/2022 | . | . |
| 4 | 3/19/2022 | . | . |
Below could work. Code not tested because you didn't provide scripts that create the sample data.
proc sort data=have;
by id t_date h_date d_date;
run;
data want;
set have;
by id t_date h_date d_date;
if last.t_date;
run;
Below could work. Code not tested because you didn't provide scripts that create the sample data.
proc sort data=have;
by id t_date h_date d_date;
run;
data want;
set have;
by id t_date h_date d_date;
if last.t_date;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.