The current data structure for the problem I have is:
id | L_date | M_date | Note |
1 | 9/1/2020 | 9/29/2020 | keep, as it is closest |
1 | 8/30/2020 | 9/29/2020 | change to null |
2 | 8/13/2022 | 10/21/2022 | change to null |
2 | 10/1/2022 | 10/21/2022 | keep, as it is closest |
2 | 10/2/2023 | 10/21/2022 | change to null |
3 | 12/11/2021 | . | |
3 | 5/19/2022 | . | |
4 | 2/5/2022 | 12/11/2021 | keep, as it is closest |
4 | 3/19/2022 | 12/11/2021 | change to null |
4 | 6/21/2022 | 12/11/2021 | change to null |
I am trying to retain the closest M_date column value that is to the L_date value and turn the other same M_date values for the same ID to null.
The data structure that I am trying to achieve is below:
id | L_date | M_date |
1 | 9/1/2020 | 9/29/2020 |
1 | 8/30/2020 | |
2 | 8/13/2022 | |
2 | 10/1/2022 | 10/21/2022 |
2 | 10/2/2023 | |
3 | 12/11/2021 | . |
3 | 5/19/2022 | . |
4 | 2/5/2022 | 12/11/2021 |
4 | 3/19/2022 | |
4 | 6/21/2022 |
Here's a solution, I expanded your sample data to include two different scenarios - one where there is a tie and one where there are missing M_Date and non missing M_date for the same ID.
data have;
infile cards truncover;
input id 1. L_date : mmddyy10. M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1 9/1/2020 9/29/2020
1 8/30/2020 9/27/2020
2 8/13/2022 10/21/2022
2 10/1/2022 10/21/2022
2 10/2/2023 10/21/2022
3 12/11/2021 .
3 5/19/2022 .
4 2/5/2022 12/11/2021
4 3/19/2022 12/11/2021
4 6/21/2022 12/11/2021
5 2/5/2022 12/11/2021
5 2/5/2022 12/14/2021
5 3/19/2022 .
5 6/21/2022 .
;;;
run;
data step1;
set have;
abs_diff = abs(m_date-l_date);
run;
proc sort data=step1;
by id abs_diff;
run;
data want;
set step1;
by id;
retain flag;
if first.id then flag=0;
if first.id and not missing(abs_diff) then flag=1;
else if flag=0 and not missing(abs_diff) then flag=1;
else if flag=1 then call missing(m_date);
run;
It can be sorted from oldest dates first to newest dates last.
Ties can be left as same.
Here's a solution, I expanded your sample data to include two different scenarios - one where there is a tie and one where there are missing M_Date and non missing M_date for the same ID.
data have;
infile cards truncover;
input id 1. L_date : mmddyy10. M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1 9/1/2020 9/29/2020
1 8/30/2020 9/27/2020
2 8/13/2022 10/21/2022
2 10/1/2022 10/21/2022
2 10/2/2023 10/21/2022
3 12/11/2021 .
3 5/19/2022 .
4 2/5/2022 12/11/2021
4 3/19/2022 12/11/2021
4 6/21/2022 12/11/2021
5 2/5/2022 12/11/2021
5 2/5/2022 12/14/2021
5 3/19/2022 .
5 6/21/2022 .
;;;
run;
data step1;
set have;
abs_diff = abs(m_date-l_date);
run;
proc sort data=step1;
by id abs_diff;
run;
data want;
set step1;
by id;
retain flag;
if first.id then flag=0;
if first.id and not missing(abs_diff) then flag=1;
else if flag=0 and not missing(abs_diff) then flag=1;
else if flag=1 then call missing(m_date);
run;
Yes, what is DOW?
data have;
infile cards truncover;
input id 1. L_date : mmddyy10. M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1 9/1/2020 9/29/2020
1 8/30/2020 9/29/2020
2 8/13/2022 10/21/2022
2 10/1/2022 10/21/2022
2 10/2/2023 10/21/2022
3 12/11/2021 .
3 5/19/2022 .
4 2/5/2022 12/11/2021
4 3/19/2022 12/11/2021
4 6/21/2022 12/11/2021
5 2/5/2022 12/11/2021
5 3/19/2022 .
5 6/21/2022 .
;;;
run;
proc sql;
create table want(drop=diff M_date) as
select *,ifn(diff=min(diff),M_date,.) as new_Mdate format=mmddyy10. from
(select *, abs(M_date-L_date) as diff from have)
group by id
order by 1,2;
quit;
Here first ID both records dates l_date & m_Date have same days difference (28 days) , it means m_date should populate for both records ?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.