The current data structure for the problem I have is:
id | L_date | M_date | Note |
1 | 9/1/2020 | 9/29/2020 | keep, as it is closest |
1 | 8/30/2020 | 9/29/2020 | change to null |
2 | 8/13/2022 | 10/21/2022 | change to null |
2 | 10/1/2022 | 10/21/2022 | keep, as it is closest |
2 | 10/2/2023 | 10/21/2022 | change to null |
3 | 12/11/2021 | . | |
3 | 5/19/2022 | . | |
4 | 2/5/2022 | 12/11/2021 | keep, as it is closest |
4 | 3/19/2022 | 12/11/2021 | change to null |
4 | 6/21/2022 | 12/11/2021 | change to null |
I am trying to retain the closest M_date column value that is to the L_date value and turn the other same M_date values for the same ID to null.
The data structure that I am trying to achieve is below:
id | L_date | M_date |
1 | 9/1/2020 | 9/29/2020 |
1 | 8/30/2020 | |
2 | 8/13/2022 | |
2 | 10/1/2022 | 10/21/2022 |
2 | 10/2/2023 | |
3 | 12/11/2021 | . |
3 | 5/19/2022 | . |
4 | 2/5/2022 | 12/11/2021 |
4 | 3/19/2022 | |
4 | 6/21/2022 |
Here's a solution, I expanded your sample data to include two different scenarios - one where there is a tie and one where there are missing M_Date and non missing M_date for the same ID.
data have;
infile cards truncover;
input id 1. L_date : mmddyy10. M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1 9/1/2020 9/29/2020
1 8/30/2020 9/27/2020
2 8/13/2022 10/21/2022
2 10/1/2022 10/21/2022
2 10/2/2023 10/21/2022
3 12/11/2021 .
3 5/19/2022 .
4 2/5/2022 12/11/2021
4 3/19/2022 12/11/2021
4 6/21/2022 12/11/2021
5 2/5/2022 12/11/2021
5 2/5/2022 12/14/2021
5 3/19/2022 .
5 6/21/2022 .
;;;
run;
data step1;
set have;
abs_diff = abs(m_date-l_date);
run;
proc sort data=step1;
by id abs_diff;
run;
data want;
set step1;
by id;
retain flag;
if first.id then flag=0;
if first.id and not missing(abs_diff) then flag=1;
else if flag=0 and not missing(abs_diff) then flag=1;
else if flag=1 then call missing(m_date);
run;
It can be sorted from oldest dates first to newest dates last.
Ties can be left as same.
Here's a solution, I expanded your sample data to include two different scenarios - one where there is a tie and one where there are missing M_Date and non missing M_date for the same ID.
data have;
infile cards truncover;
input id 1. L_date : mmddyy10. M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1 9/1/2020 9/29/2020
1 8/30/2020 9/27/2020
2 8/13/2022 10/21/2022
2 10/1/2022 10/21/2022
2 10/2/2023 10/21/2022
3 12/11/2021 .
3 5/19/2022 .
4 2/5/2022 12/11/2021
4 3/19/2022 12/11/2021
4 6/21/2022 12/11/2021
5 2/5/2022 12/11/2021
5 2/5/2022 12/14/2021
5 3/19/2022 .
5 6/21/2022 .
;;;
run;
data step1;
set have;
abs_diff = abs(m_date-l_date);
run;
proc sort data=step1;
by id abs_diff;
run;
data want;
set step1;
by id;
retain flag;
if first.id then flag=0;
if first.id and not missing(abs_diff) then flag=1;
else if flag=0 and not missing(abs_diff) then flag=1;
else if flag=1 then call missing(m_date);
run;
Yes, what is DOW?
data have;
infile cards truncover;
input id 1. L_date : mmddyy10. M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1 9/1/2020 9/29/2020
1 8/30/2020 9/29/2020
2 8/13/2022 10/21/2022
2 10/1/2022 10/21/2022
2 10/2/2023 10/21/2022
3 12/11/2021 .
3 5/19/2022 .
4 2/5/2022 12/11/2021
4 3/19/2022 12/11/2021
4 6/21/2022 12/11/2021
5 2/5/2022 12/11/2021
5 3/19/2022 .
5 6/21/2022 .
;;;
run;
proc sql;
create table want(drop=diff M_date) as
select *,ifn(diff=min(diff),M_date,.) as new_Mdate format=mmddyy10. from
(select *, abs(M_date-L_date) as diff from have)
group by id
order by 1,2;
quit;
Here first ID both records dates l_date & m_Date have same days difference (28 days) , it means m_date should populate for both records ?
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.