Hi all,
Once again I've had great difficulty getting the logic correct in my head in regards to implementing, and correctly using, the RETAIN statement. I have a dataset where I need to identify the date (FU_DT) from a patient's next record and the changed score (BIRAD2). A majority of the time these will occur on the 2nd record but, at times, the score change will occur more downstream than the 2nd record. And this is where I've admitted failure. And I only need to process these records until I've identified the changed score, i.e., all other subsequent rows can be ignored.
What I'm getting with my current code:
exam_dt | id | exam_cnt | birad_score_radi | birad1 | birad2 | fu_dt | birad_dt | outp |
26-Nov-18 | 1 | 1 | 0 | 0 | . | . | . | . |
1-Mar-19 | 1 | 2 | 4 | 0 | 4 | 1-Mar-19 | 1-Mar-19 | 1 |
24-Jun-20 | 1 | 3 | . | 0 | 4 | 1-Mar-19 | 1-Mar-19 | . |
14-Jan-20 | 2 | 1 | 0 | 0 | . | . | . | . |
7-Feb-20 | 2 | 2 | . | 0 | . | 7-Feb-20 | . | . |
27-Feb-20 | 2 | 3 | . | 0 | . | 7-Feb-20 | . | . |
8-Jun-20 | 2 | 4 | 2 | 0 | 2 | 8-Jun-20 | 8-Jun-20 | 1 |
11-Aug-21 | 2 | 5 | 2 | 0 | 2 | 11-Aug-21 | 11-Aug-21 | 1 |
3-Dec-19 | 3 | 1 | 0 | 0 | . | . | . | . |
14-Jan-20 | 3 | 2 | . | 0 | . | 14-Jan-20 | . | . |
30-Jan-20 | 3 | 3 | 2 | 0 | 2 | 30-Jan-20 | 30-Jan-20 | 1 |
And here's what I ideally would like:
id | fu_dt | birad_dt | birad1 | birad2 |
1 | 1-Mar-19 | 1-Mar-19 | 0 | 4 |
2 | 7-Feb-20 | 8-Jun-20 | 0 | 2 |
3 | 14-Jan-20 | 30-Jan-20 | 0 | 2 |
The code that generates the 1st table of output:
data have;
format exam_dt date9.;
input id exam_dt :date9. exam_cnt birad_score_radi;
infile datalines missover;
datalines;
1 26Nov18 1 0
1 01Mar19 2 4
1 24Jun20 3 .
2 14Jan20 1 0
2 07Feb20 2 .
2 27Feb20 3 .
2 08Jun20 4 2
2 11Aug21 5 2
3 03Dec19 1 0
3 14Jan20 2 .
3 30Jan20 3 2
;
run;
data want;
*format id id_1 id_last exam_dt;
set have;
by id exam_dt;
*+++++++++++++++++++++++++
* Assessment of 0 scores
*+++++++++++++++++++++++++;
* Scenario 1:
--> Initial screening BIRAD_SCORE_RADI = 0
^ 1st FU will be the FU date
^ If 1st FU includes the changed BIRAD score, note the date (FU_DT)
^ If 1st FU <> have changed BIRAD score then find the next record & note date (BIRAD_DT)
;
retain birad1 birad2 fu_dt birad_dt;
format fu_dt birad_dt date9.;
* TYPE:
1: initial score = 0 w/ FU
2: amended score from 0 w/o FU
3: initial score = 0 w/o FU
;
/* Reset retained vars to missing when onto next patient */
if first.id then do;
birad1 = .;
birad2 = .;
fu_dt = .;
birad_dt = .;
end;
/* Identify & output when initial BIRAD = 0 */
if first.id & birad_score_radi = 0 then do;
birad1 = birad_score_radi;
end;
else if not first.id & birad1 ^= . then do;
if birad_score_radi ^= . then do;
birad2 = birad_score_radi;
birad_dt = exam_dt;
fu_dt = exam_dt;
outp = 1;
end;
if birad_score_radi = . then do;
if exam_cnt = 2 then fu_dt = exam_dt;
end;
else if birad_score_radi ^= . then do;
birad2 = birad_score_radi;
birad_dt = exam_dt;
outp = 1;
end;
end;
/*
else if birad1 = 0 then do;
if birad_score_radi ^= . then do;
fu_dt = exam_dt;
birad_dt = exam_dt;
birad2 = birad_score_radi;
outp = 1;
end;
end;
*/
run;
I'm open to any suggestions or alternative ways to do this. Kind regards, Brian.
Do you have any values where the birad_score_radi variable is missing on the first record for an ID. If so you need to provide that as an example in the data.
I also think that you need to expand on what this means " need to identify the date (FU_DT) from a patient's next record". From the shown "want" data it is always the date on the second record. If this is not the actual rule then you need to provide a different example data set and some description of what you mean in more detail.
For your given example data this creates the desired output:
data want; set have; by id; retain birad1 idcount fu_dt; idcount+1; if first.id then do; birad1= birad_score_radi; idcount=1; end; if idcount=2 then fu_dt=exam_dt; if idcount>1 and (0 le birad1 < birad_score_radi) then do; birad2= birad_score_radi; birad_dt = exam_dt; output; birad1= .; end; format fu_dt birad_dt date9. ; keep id fu_dt birad_dt birad1 birad2; run;
@BrianB4233 wrote:
Hi all,
Once again I've had great difficulty getting the logic correct in my head in regards to implementing, and correctly using, the RETAIN statement. I have a dataset where I need to identify the date (FU_DT) from a patient's next record and the changed score (BIRAD2). A majority of the time these will occur on the 2nd record but, at times, the score change will occur more downstream than the 2nd record. And this is where I've admitted failure. And I only need to process these records until I've identified the changed score, i.e., all other subsequent rows can be ignored.
What I'm getting with my current code:
exam_dt id exam_cnt birad_score_radi birad1 birad2 fu_dt birad_dt outp 26-Nov-18 1 1 0 0 . . . . 1-Mar-19 1 2 4 0 4 1-Mar-19 1-Mar-19 1 24-Jun-20 1 3 . 0 4 1-Mar-19 1-Mar-19 . 14-Jan-20 2 1 0 0 . . . . 7-Feb-20 2 2 . 0 . 7-Feb-20 . . 27-Feb-20 2 3 . 0 . 7-Feb-20 . . 8-Jun-20 2 4 2 0 2 8-Jun-20 8-Jun-20 1 11-Aug-21 2 5 2 0 2 11-Aug-21 11-Aug-21 1 3-Dec-19 3 1 0 0 . . . . 14-Jan-20 3 2 . 0 . 14-Jan-20 . . 30-Jan-20 3 3 2 0 2 30-Jan-20 30-Jan-20 1
And here's what I ideally would like:
id fu_dt birad_dt birad1 birad2 1 1-Mar-19 1-Mar-19 0 4 2 7-Feb-20 8-Jun-20 0 2 3 14-Jan-20 30-Jan-20 0 2
The code that generates the 1st table of output:
data have; format exam_dt date9.; input id exam_dt :date9. exam_cnt birad_score_radi; infile datalines missover; datalines; 1 26Nov18 1 0 1 01Mar19 2 4 1 24Jun20 3 . 2 14Jan20 1 0 2 07Feb20 2 . 2 27Feb20 3 . 2 08Jun20 4 2 2 11Aug21 5 2 3 03Dec19 1 0 3 14Jan20 2 . 3 30Jan20 3 2 ; run; data want; *format id id_1 id_last exam_dt; set have; by id exam_dt; *+++++++++++++++++++++++++ * Assessment of 0 scores *+++++++++++++++++++++++++; * Scenario 1: --> Initial screening BIRAD_SCORE_RADI = 0 ^ 1st FU will be the FU date ^ If 1st FU includes the changed BIRAD score, note the date (FU_DT) ^ If 1st FU <> have changed BIRAD score then find the next record & note date (BIRAD_DT) ; retain birad1 birad2 fu_dt birad_dt; format fu_dt birad_dt date9.; * TYPE: 1: initial score = 0 w/ FU 2: amended score from 0 w/o FU 3: initial score = 0 w/o FU ; /* Reset retained vars to missing when onto next patient */ if first.id then do; birad1 = .; birad2 = .; fu_dt = .; birad_dt = .; end; /* Identify & output when initial BIRAD = 0 */ if first.id & birad_score_radi = 0 then do; birad1 = birad_score_radi; end; else if not first.id & birad1 ^= . then do; if birad_score_radi ^= . then do; birad2 = birad_score_radi; birad_dt = exam_dt; fu_dt = exam_dt; outp = 1; end; if birad_score_radi = . then do; if exam_cnt = 2 then fu_dt = exam_dt; end; else if birad_score_radi ^= . then do; birad2 = birad_score_radi; birad_dt = exam_dt; outp = 1; end; end; /* else if birad1 = 0 then do; if birad_score_radi ^= . then do; fu_dt = exam_dt; birad_dt = exam_dt; birad2 = birad_score_radi; outp = 1; end; end; */ run;
I'm open to any suggestions or alternative ways to do this. Kind regards, Brian.
Do you have any values where the birad_score_radi variable is missing on the first record for an ID. If so you need to provide that as an example in the data.
I also think that you need to expand on what this means " need to identify the date (FU_DT) from a patient's next record". From the shown "want" data it is always the date on the second record. If this is not the actual rule then you need to provide a different example data set and some description of what you mean in more detail.
For your given example data this creates the desired output:
data want; set have; by id; retain birad1 idcount fu_dt; idcount+1; if first.id then do; birad1= birad_score_radi; idcount=1; end; if idcount=2 then fu_dt=exam_dt; if idcount>1 and (0 le birad1 < birad_score_radi) then do; birad2= birad_score_radi; birad_dt = exam_dt; output; birad1= .; end; format fu_dt birad_dt date9. ; keep id fu_dt birad_dt birad1 birad2; run;
@BrianB4233 wrote:
Hi all,
Once again I've had great difficulty getting the logic correct in my head in regards to implementing, and correctly using, the RETAIN statement. I have a dataset where I need to identify the date (FU_DT) from a patient's next record and the changed score (BIRAD2). A majority of the time these will occur on the 2nd record but, at times, the score change will occur more downstream than the 2nd record. And this is where I've admitted failure. And I only need to process these records until I've identified the changed score, i.e., all other subsequent rows can be ignored.
What I'm getting with my current code:
exam_dt id exam_cnt birad_score_radi birad1 birad2 fu_dt birad_dt outp 26-Nov-18 1 1 0 0 . . . . 1-Mar-19 1 2 4 0 4 1-Mar-19 1-Mar-19 1 24-Jun-20 1 3 . 0 4 1-Mar-19 1-Mar-19 . 14-Jan-20 2 1 0 0 . . . . 7-Feb-20 2 2 . 0 . 7-Feb-20 . . 27-Feb-20 2 3 . 0 . 7-Feb-20 . . 8-Jun-20 2 4 2 0 2 8-Jun-20 8-Jun-20 1 11-Aug-21 2 5 2 0 2 11-Aug-21 11-Aug-21 1 3-Dec-19 3 1 0 0 . . . . 14-Jan-20 3 2 . 0 . 14-Jan-20 . . 30-Jan-20 3 3 2 0 2 30-Jan-20 30-Jan-20 1
And here's what I ideally would like:
id fu_dt birad_dt birad1 birad2 1 1-Mar-19 1-Mar-19 0 4 2 7-Feb-20 8-Jun-20 0 2 3 14-Jan-20 30-Jan-20 0 2
The code that generates the 1st table of output:
data have; format exam_dt date9.; input id exam_dt :date9. exam_cnt birad_score_radi; infile datalines missover; datalines; 1 26Nov18 1 0 1 01Mar19 2 4 1 24Jun20 3 . 2 14Jan20 1 0 2 07Feb20 2 . 2 27Feb20 3 . 2 08Jun20 4 2 2 11Aug21 5 2 3 03Dec19 1 0 3 14Jan20 2 . 3 30Jan20 3 2 ; run; data want; *format id id_1 id_last exam_dt; set have; by id exam_dt; *+++++++++++++++++++++++++ * Assessment of 0 scores *+++++++++++++++++++++++++; * Scenario 1: --> Initial screening BIRAD_SCORE_RADI = 0 ^ 1st FU will be the FU date ^ If 1st FU includes the changed BIRAD score, note the date (FU_DT) ^ If 1st FU <> have changed BIRAD score then find the next record & note date (BIRAD_DT) ; retain birad1 birad2 fu_dt birad_dt; format fu_dt birad_dt date9.; * TYPE: 1: initial score = 0 w/ FU 2: amended score from 0 w/o FU 3: initial score = 0 w/o FU ; /* Reset retained vars to missing when onto next patient */ if first.id then do; birad1 = .; birad2 = .; fu_dt = .; birad_dt = .; end; /* Identify & output when initial BIRAD = 0 */ if first.id & birad_score_radi = 0 then do; birad1 = birad_score_radi; end; else if not first.id & birad1 ^= . then do; if birad_score_radi ^= . then do; birad2 = birad_score_radi; birad_dt = exam_dt; fu_dt = exam_dt; outp = 1; end; if birad_score_radi = . then do; if exam_cnt = 2 then fu_dt = exam_dt; end; else if birad_score_radi ^= . then do; birad2 = birad_score_radi; birad_dt = exam_dt; outp = 1; end; end; /* else if birad1 = 0 then do; if birad_score_radi ^= . then do; fu_dt = exam_dt; birad_dt = exam_dt; birad2 = birad_score_radi; outp = 1; end; end; */ run;
I'm open to any suggestions or alternative ways to do this. Kind regards, Brian.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.