BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SP01
Obsidian | Level 7

The current data structure for the problem I have is:

id L_date M_date Note
1 9/1/2020 9/29/2020 keep, as it is closest
1 8/30/2020 9/29/2020 change to null
2 8/13/2022 10/21/2022 change to null
2 10/1/2022 10/21/2022 keep, as it is closest
2 10/2/2023 10/21/2022 change to null
3 12/11/2021 .  
3 5/19/2022 .  
4 2/5/2022 12/11/2021 keep, as it is closest
4 3/19/2022 12/11/2021 change to null
4 6/21/2022 12/11/2021 change to null

 

I am trying to retain the closest M_date column value that is to the L_date value and turn the other same M_date values for the same ID to null.

 

The data structure that I am trying to achieve is below:

id L_date M_date
1 9/1/2020 9/29/2020
1 8/30/2020  
2 8/13/2022  
2 10/1/2022 10/21/2022
2 10/2/2023  
3 12/11/2021 .
3 5/19/2022 .
4 2/5/2022 12/11/2021
4 3/19/2022  
4 6/21/2022  
1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

Here's a solution, I expanded your sample data to include two different scenarios - one where there is a tie and one where there are missing M_Date and non missing M_date for the same ID.

 

data have;
infile cards truncover;
input id 1. L_date : mmddyy10.  M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1   9/1/2020    9/29/2020   
1   8/30/2020   9/27/2020
2   8/13/2022   10/21/2022
2   10/1/2022   10/21/2022
2   10/2/2023   10/21/2022
3   12/11/2021  .    
3   5/19/2022   .    
4   2/5/2022    12/11/2021
4   3/19/2022   12/11/2021  
4   6/21/2022   12/11/2021
5   2/5/2022    12/11/2021
5   2/5/2022    12/14/2021
5   3/19/2022   .
5   6/21/2022   .
;;;
run;

data step1;
set have;
abs_diff = abs(m_date-l_date);
run;

proc sort data=step1;
by id abs_diff;
run;

data want;
set step1;
by id;
retain flag;
if first.id then flag=0;

if first.id and not missing(abs_diff) then flag=1;
else if flag=0 and not missing(abs_diff) then flag=1;
else if flag=1 then call missing(m_date);

run;

View solution in original post

12 REPLIES 12
Reeza
Super User
Should L be sorted in any manner? It doesn't appear to be consistently sorted.
SP01
Obsidian | Level 7

It can be sorted from oldest dates first to newest dates last.

PaigeMiller
Diamond | Level 26
  1. Subtract L_date from M_date; then take the absolute value.
  2. Run PROC RANK on the absolute value of the difference, computed in step 1, use a BY ID; statement
  3. If the ranked value is 1 then keep M_date; other wise set M_date to null
--
Paige Miller
Reeza
Super User
And how to deal with ties?
PaigeMiller
Diamond | Level 26

@Reeza wrote:
And how to deal with ties?

Don't ask me, that's up to the OP @SP01 

--
Paige Miller
SP01
Obsidian | Level 7

Ties can be left  as same.

Reeza
Super User

Here's a solution, I expanded your sample data to include two different scenarios - one where there is a tie and one where there are missing M_Date and non missing M_date for the same ID.

 

data have;
infile cards truncover;
input id 1. L_date : mmddyy10.  M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1   9/1/2020    9/29/2020   
1   8/30/2020   9/27/2020
2   8/13/2022   10/21/2022
2   10/1/2022   10/21/2022
2   10/2/2023   10/21/2022
3   12/11/2021  .    
3   5/19/2022   .    
4   2/5/2022    12/11/2021
4   3/19/2022   12/11/2021  
4   6/21/2022   12/11/2021
5   2/5/2022    12/11/2021
5   2/5/2022    12/14/2021
5   3/19/2022   .
5   6/21/2022   .
;;;
run;

data step1;
set have;
abs_diff = abs(m_date-l_date);
run;

proc sort data=step1;
by id abs_diff;
run;

data want;
set step1;
by id;
retain flag;
if first.id then flag=0;

if first.id and not missing(abs_diff) then flag=1;
else if flag=0 and not missing(abs_diff) then flag=1;
else if flag=1 then call missing(m_date);

run;
Astounding
PROC Star
Anybody want to try their hand at a one-step double DOW solution?
SP01
Obsidian | Level 7

Yes, what is DOW?

Astounding
PROC Star
DOW is a DO loop that runs through groups of observations. It was popularized by Ian Whitlock (although he didn't name it so it is not clear whether the W stands for Whitlock). The basic form for this problem would be:

data want;
do until (last.id);
set have;
*code to find min diff for this id;
end;
do until (last.id);
set have;
by id;
*code to compare same observations to min and output;
end;
run;

You could also take a similar approach without the explicit looping:

data want;
set have (in=a) have;
by id;
if a then do;
end;
else do;
end;
run;

The bottom program would require retaining some calculated results, and checking for first.id.
Ksharp
Super User
data have;
infile cards truncover;
input id 1. L_date : mmddyy10.  M_date : mmddyy10.;
format l_date m_date yymmdd10.;;
cards;
1   9/1/2020    9/29/2020   
1   8/30/2020   9/29/2020
2   8/13/2022   10/21/2022
2   10/1/2022   10/21/2022
2   10/2/2023   10/21/2022
3   12/11/2021  .    
3   5/19/2022   .    
4   2/5/2022    12/11/2021
4   3/19/2022   12/11/2021  
4   6/21/2022   12/11/2021
5   2/5/2022    12/11/2021
5   3/19/2022   .
5   6/21/2022   .
;;;
run;

proc sql;
create table want(drop=diff M_date) as
select *,ifn(diff=min(diff),M_date,.) as new_Mdate format=mmddyy10. from
(select *, abs(M_date-L_date) as diff from have)
group by id 
order by 1,2;
quit;
lavu_90
Calcite | Level 5

Here first ID both records dates l_date & m_Date have same days difference (28 days) , it means m_date should populate for both records ?

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 2410 views
  • 7 likes
  • 6 in conversation