Solved: Advanced de-duplication using conditions

SP01 · Posted 11-21-2023 10:01 AM

The structure of the data I have is like below:

id	t_date	h_date	d_date	note
1	8/30/2020	.	.
1	8/30/2020	.	.	keep any one row
2	9/21/2021
2	9/21/2021	10/15/2021	.	Retain only this one
2	10/27/2021	.	.	this row
3	12/11/2021	.	.	remove this row
3	12/11/2021	12/20/2021	1/1/2022	retain this row
4	2/5/2022	.	.
4	3/19/2022	.	.

I want to de-duplicate if the rows have same ID, t_date and h_date, d_date is null. But if the rows have same ID & t_date, but either h_date or d_date is not null, I want to keep this row. I want to remove the row with the same ID having both h_date and d_date as null.

What I want is this structure below:

id	t_date	h_date	d_date
1	8/30/2020	.	.
2	9/21/2021	10/15/2021	.
2	10/27/2021	.	.
3	12/11/2021	12/20/2021	1/1/2022
4	2/5/2022	.	.
4	3/19/2022	.	.

Patrick · Posted 11-21-2023 11:12 AM

Below could work. Code not tested because you didn't provide scripts that create the sample data.

proc sort data=have;
  by id t_date h_date d_date;
run;

data want;
  set have;
  by id t_date h_date d_date;
  if last.t_date;
run;

View solution in original post

Patrick · Posted 11-21-2023 11:12 AM

Below could work. Code not tested because you didn't provide scripts that create the sample data.

proc sort data=have;
  by id t_date h_date d_date;
run;

data want;
  set have;
  by id t_date h_date d_date;
  if last.t_date;
run;

Advanced de-duplication using conditions

Re: Advanced de-duplication using conditions

Re: Advanced de-duplication using conditions

Catch up on SAS Innovate 2026

Advanced de-duplication using conditions

Re: Advanced de-duplication using conditions

Re: Advanced de-duplication using conditions

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away