Hi all,
I am trying to compare the start and end dates with the entered date variable and create a new variable for status. Each ID will have only one entered date.
Rule 1 – If missing Start and End by ID then Status = Unknown
Rule 2 – if not missing start and end; If start or End dates are available then compare the closest date that matches ‘Entered’ variable date and output.
For example, for ID #1 only 1 ‘Start’ date is available, Comparing this date with ‘Entered’ date i.e. Entered Date > Start Date and the corresponding ‘LOTS’ = 2, therefore output Status = After 2.
Similarly, for ID# 13, Only one available END date and this end date corresponds to LOTS =2, therefore output LOTS = After 2.
Next for IDs with multiple complete start and end dates choose the closest end date compared to Entered Date. For example, ID#2 has 3 start and end dates and the closest end date for the Entered date happened to be in LOTS=3, therefore status = 3.
For ID#3, the Entered date > the closest start date (8/14/2016) therefore Status = After 2
Finally, For any LOTS > 4, the status will always be After 4. For example ID#20 and 21, the LOTS are greater than 4 so therefore status will be After 4
Data Want:
ID | Entered | Start | End | LOTS | Status |
1 | 12/13/2016 | 9/21/2016 | 2 | After 2 | |
2 | 12/14/2016 | 6/14/2016 | 11/14/2016 | 3 | After 3 |
3 | 1/2/2017 | 8/14/2016 | 2 | After 2 | |
4 | 1/25/2017 | 9/14/2016 | 12/15/2016 | 2 | After 2 |
5 | 3/23/2017 | 11/14/2016 | 2/14/2017 | 0 | After 0 |
6 | 4/30/2017 | 1/10/2017 | 4/14/2017 | 2 | After 2 |
7 | 7/17/2017 | . | Unknown | ||
8 | 8/24/2017 | . | Unknown | ||
9 | 1/31/2018 | . | Unknown | ||
10 | 2/11/2018 | 10/12/2017 | 12/21/2017 | -1 | After -1 |
11 | 4/22/2018 | 9/4/2017 | 1/15/2018 | -1 | After -1 |
12 | 6/11/2018 | 3/17/2018 | 5/15/2018 | 3 | After 3 |
13 | 6/7/2018 | 2/1/2018 | 2 | After 2 | |
14 | 7/30/2018 | 9/14/2017 | 4/30/2018 | 1 | After 1 |
15 | 8/19/2018 | . | Unknown | ||
16 | 8/21/2018 | 6/14/2018 | 6/15/2018 | 0 | After 0 |
17 | 9/10/2018 | 4/8/2018 | 5/31/2018 | 0 | After 0 |
18 | 12/12/2018 | 7/15/2018 | 11/14/2018 | 4 | After 4 |
19 | 12/16/2018 | 9/14/2018 | 11/14/2018 | 1 | After 1 |
20 | 1/14/2019 | 11/15/2015 | 1/25/2017 | 7 | After 4 |
21 | 3/20/2019 | 8/14/2017 | 1/14/2018 | 5 | After 4 |
22 | 4/23/2019 | 4/7/2019 | 4/8/2019 | 0 | After 0 |
23 | 7/10/2019 | . | Unknown |
Data Have:
ID | Entered | Start | End | LOTS |
1 | 12/13/2016 | 1 | ||
1 | 12/13/2016 | 9/21/2016 | 2 | |
2 | 12/14/2016 | 10/28/2015 | 12/28/2015 | 1 |
2 | 12/14/2016 | 2/14/2016 | 6/13/2016 | 2 |
2 | 12/14/2016 | 6/14/2016 | 11/14/2016 | 3 |
3 | 1/2/2017 | 10/15/2011 | 8/14/2015 | 0 |
3 | 1/2/2017 | 10/23/2015 | 8/13/2016 | 1 |
3 | 1/2/2017 | 8/14/2016 | 2 | |
4 | 1/25/2017 | 4/14/2016 | 9/13/2016 | 1 |
4 | 1/25/2017 | 9/14/2016 | 12/15/2016 | 2 |
5 | 3/23/2017 | 11/14/2016 | 2/14/2017 | 0 |
6 | 4/30/2017 | 3/24/2016 | 12/18/2016 | 1 |
6 | 4/30/2017 | 1/10/2017 | 4/14/2017 | 2 |
7 | 7/17/2017 | . | ||
8 | 8/24/2017 | . | ||
9 | 1/31/2018 | . | ||
10 | 2/11/2018 | 10/12/2017 | 12/21/2017 | -1 |
11 | 4/22/2018 | 9/4/2017 | 1/15/2018 | -1 |
12 | 6/11/2018 | 4/14/2017 | 8/14/2017 | 1 |
12 | 6/11/2018 | 10/15/2017 | 2/14/2018 | 2 |
12 | 6/11/2018 | 3/17/2018 | 5/15/2018 | 3 |
13 | 6/7/2018 | 0 | ||
13 | 6/7/2018 | 1 | ||
13 | 6/7/2018 | 2/1/2018 | 2 | |
14 | 7/30/2018 | 9/14/2017 | 4/30/2018 | 1 |
15 | 8/19/2018 | . | ||
16 | 8/21/2018 | 6/14/2018 | 6/15/2018 | 0 |
17 | 9/10/2018 | 4/8/2018 | 5/31/2018 | 0 |
18 | 12/12/2018 | 7/15/2018 | 11/14/2018 | 4 |
19 | 12/16/2018 | 3/27/2015 | 9/13/2018 | 0 |
19 | 12/16/2018 | 9/14/2018 | 11/14/2018 | 1 |
20 | 1/14/2019 | 11/15/2015 | 1/25/2017 | 7 |
21 | 3/20/2019 | 8/14/2017 | 1/14/2018 | 5 |
21 | 3/20/2019 | 2 | ||
22 | 4/23/2019 | 4/7/2019 | 4/8/2019 | 0 |
23 | 7/10/2019 | . |
You still haven't answered my question about what you want when the ENTERED DATE is equi-distant from the closest START and closest END. Editted note: when equi-distant choose the END date an LOTS value.
But in ignorance of that objective, below is code that I believe you can use for the objectives as I understand them.
The strategy is to check each incoming START and END, keeping progressive track of the closest date value (CLOSEST_DATE) and its distance (CLOSEST_DIST) from ENTERED. And also keep track of the observation number (variable PTR) containing that date.
Then, at the end of each ID, re-reread the observation indicated by PTR, as the sole output record for that ID. Here's the code:
data have;
input id (entered start end) (:mmddyy8.) lots;
format entered start end mmddyy10.;
datalines;
1 12/13/2016 . . 1
1 12/13/2016 9/21/2016 . 2
2 12/14/2016 10/28/2015 12/28/2015 1
2 12/14/2016 2/14/2016 6/13/2016 2
2 12/14/2016 6/14/2016 11/14/2016 3
3 1/2/2017 10/15/2011 8/14/2015 0
3 1/2/2017 10/23/2015 8/13/2016 1
3 1/2/2017 8/14/2016 . 2
4 1/25/2017 4/14/2016 9/13/2016 1
4 1/25/2017 9/14/2016 12/15/2016 2
5 3/23/2017 11/14/2016 2/14/2017 0
6 4/30/2017 3/24/2016 12/18/2016 1
6 4/30/2017 1/10/2017 4/14/2017 2
7 7/17/2017 . . .
8 8/24/2017 . . .
9 1/31/2018 . . .
10 2/11/2018 10/12/2017 12/21/2017 -1
11 4/22/2018 9/4/2017 1/15/2018 -1
12 6/11/2018 4/14/2017 8/14/2017 1
12 6/11/2018 10/15/2017 2/14/2018 2
12 6/11/2018 3/17/2018 5/15/2018 3
13 6/7/2018 . . 0
13 6/7/2018 . . 1
13 6/7/2018 . 2/1/2018 2
14 7/30/2018 9/14/2017 4/30/2018 1
15 8/19/2018 . . .
16 8/21/2018 6/14/2018 6/15/2018 0
17 9/10/2018 4/8/2018 5/31/2018 0
18 12/12/2018 7/15/2018 11/14/2018 4
19 12/16/2018 3/27/2015 9/13/2018 0
19 12/16/2018 9/14/2018 11/14/2018 1
20 1/14/2019 11/15/2015 1/25/2017 7
21 3/20/2019 8/14/2017 1/14/2018 5
21 3/20/2019 . . 2
22 4/23/2019 4/7/2019 4/8/2019 0
23 7/10/2019 . . .
run;
data want (drop=closest_: date);
set have;
by id;
if first.id then PTR=_N_; /** Initialize the pointer for each ID **/
retain closest_distance closest_date;
if first.id then call missing(of closest_:);
do date=start,end;
if date=. then continue; ** Skip this iteration if date is missing **;
if min(abs(entered-date),closest_distance) = abs(entered-date) then do;
closest_date=date;
closest_distance=abs(entered-date);
ptr=_n_;
end;
end;
if last.id;
set have point=ptr; ** Re-read the designated observation **;
length status $9;
if nmiss(start,end)=2 then status='Unknown'; else
if lots>4 then status='After 4'; else /** Revised **/
status=catx(' ','After',lots);
run;
BTW, your dataset WANT has 2 observations (edited addition: for ID 19), which I presume is in error. This code produces 1 obs per id.
Comments on changed code:
After your response to the first version of this not, I modified the code above. I added the statement
if first.id then PTR=_N_; /** Initialize the pointer for each ID **/
which prevents carrying forward PTR values when an ID generates an UNKNOWN status (no minimum values found). And also added the line:
if lots>4 then status='After 4'; else
Also, if the closest END and closest START are equi-distant from ENTERED, this will chose the last qualifying date encountered.
What if you have:
ID | Entered | Start | End | Lots |
25 | 10/15/2020 | 0 | ||
25 | 10/15/2020 | 9/15/2020 | 1 | |
25 | 10/15/2020 |
11/15/2020 |
2 |
And what if the entered date is exactly half-way between START and END?
What if you have:
ID | Entered | Start | End | Lots |
25 | 10/15/2020 | 0 | ||
25 | 10/15/2020 | 9/15/2020 | 1 | |
25 | 10/15/2020 |
11/14/2020 |
2 |
@newsas007 wrote:
This scenario where there is a end date greater than entered date doesn't happen. Basically, row #3 will not exist.
I'll say that I hope that is true.
In the data I deal with I have people that "end" a program before they "start" because of data entry errors.
And people with medical tests on dates that will not occur until several years in the future or several years before the testing program began.
If you have confirmed your statement, good for you. If you believe that without testing it you might be surprised. "Shouldn't" unfortunately does not mean "doesn't" in real data entry.
You still haven't answered my question about what you want when the ENTERED DATE is equi-distant from the closest START and closest END. Editted note: when equi-distant choose the END date an LOTS value.
But in ignorance of that objective, below is code that I believe you can use for the objectives as I understand them.
The strategy is to check each incoming START and END, keeping progressive track of the closest date value (CLOSEST_DATE) and its distance (CLOSEST_DIST) from ENTERED. And also keep track of the observation number (variable PTR) containing that date.
Then, at the end of each ID, re-reread the observation indicated by PTR, as the sole output record for that ID. Here's the code:
data have;
input id (entered start end) (:mmddyy8.) lots;
format entered start end mmddyy10.;
datalines;
1 12/13/2016 . . 1
1 12/13/2016 9/21/2016 . 2
2 12/14/2016 10/28/2015 12/28/2015 1
2 12/14/2016 2/14/2016 6/13/2016 2
2 12/14/2016 6/14/2016 11/14/2016 3
3 1/2/2017 10/15/2011 8/14/2015 0
3 1/2/2017 10/23/2015 8/13/2016 1
3 1/2/2017 8/14/2016 . 2
4 1/25/2017 4/14/2016 9/13/2016 1
4 1/25/2017 9/14/2016 12/15/2016 2
5 3/23/2017 11/14/2016 2/14/2017 0
6 4/30/2017 3/24/2016 12/18/2016 1
6 4/30/2017 1/10/2017 4/14/2017 2
7 7/17/2017 . . .
8 8/24/2017 . . .
9 1/31/2018 . . .
10 2/11/2018 10/12/2017 12/21/2017 -1
11 4/22/2018 9/4/2017 1/15/2018 -1
12 6/11/2018 4/14/2017 8/14/2017 1
12 6/11/2018 10/15/2017 2/14/2018 2
12 6/11/2018 3/17/2018 5/15/2018 3
13 6/7/2018 . . 0
13 6/7/2018 . . 1
13 6/7/2018 . 2/1/2018 2
14 7/30/2018 9/14/2017 4/30/2018 1
15 8/19/2018 . . .
16 8/21/2018 6/14/2018 6/15/2018 0
17 9/10/2018 4/8/2018 5/31/2018 0
18 12/12/2018 7/15/2018 11/14/2018 4
19 12/16/2018 3/27/2015 9/13/2018 0
19 12/16/2018 9/14/2018 11/14/2018 1
20 1/14/2019 11/15/2015 1/25/2017 7
21 3/20/2019 8/14/2017 1/14/2018 5
21 3/20/2019 . . 2
22 4/23/2019 4/7/2019 4/8/2019 0
23 7/10/2019 . . .
run;
data want (drop=closest_: date);
set have;
by id;
if first.id then PTR=_N_; /** Initialize the pointer for each ID **/
retain closest_distance closest_date;
if first.id then call missing(of closest_:);
do date=start,end;
if date=. then continue; ** Skip this iteration if date is missing **;
if min(abs(entered-date),closest_distance) = abs(entered-date) then do;
closest_date=date;
closest_distance=abs(entered-date);
ptr=_n_;
end;
end;
if last.id;
set have point=ptr; ** Re-read the designated observation **;
length status $9;
if nmiss(start,end)=2 then status='Unknown'; else
if lots>4 then status='After 4'; else /** Revised **/
status=catx(' ','After',lots);
run;
BTW, your dataset WANT has 2 observations (edited addition: for ID 19), which I presume is in error. This code produces 1 obs per id.
Comments on changed code:
After your response to the first version of this not, I modified the code above. I added the statement
if first.id then PTR=_N_; /** Initialize the pointer for each ID **/
which prevents carrying forward PTR values when an ID generates an UNKNOWN status (no minimum values found). And also added the line:
if lots>4 then status='After 4'; else
Also, if the closest END and closest START are equi-distant from ENTERED, this will chose the last qualifying date encountered.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.