BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
BrianB4233
Obsidian | Level 7

Hi all,

Once again I've had great difficulty getting the logic correct in my head in regards to implementing, and correctly using, the RETAIN statement. I have a dataset where I need to identify the date (FU_DT) from a patient's next record and the changed score (BIRAD2). A majority of the time these will occur on the 2nd record but, at times, the score change will occur more downstream than the 2nd record. And this is where I've admitted failure. And I only need to process these records until I've identified the changed score, i.e., all other subsequent rows can be ignored.

 

What I'm getting with my current code:

exam_dt id exam_cnt birad_score_radi birad1 birad2 fu_dt birad_dt outp
26-Nov-18 1 1 0 0 . . . .
1-Mar-19 1 2 4 0 4 1-Mar-19 1-Mar-19 1
24-Jun-20 1 3 . 0 4 1-Mar-19 1-Mar-19 .
14-Jan-20 2 1 0 0 . . . .
7-Feb-20 2 2 . 0 . 7-Feb-20 . .
27-Feb-20 2 3 . 0 . 7-Feb-20 . .
8-Jun-20 2 4 2 0 2 8-Jun-20 8-Jun-20 1
11-Aug-21 2 5 2 0 2 11-Aug-21 11-Aug-21 1
3-Dec-19 3 1 0 0 . . . .
14-Jan-20 3 2 . 0 . 14-Jan-20 . .
30-Jan-20 3 3 2 0 2 30-Jan-20 30-Jan-20 1

 

And here's what I ideally would like:

id fu_dt birad_dt birad1 birad2
1 1-Mar-19 1-Mar-19 0 4
2 7-Feb-20 8-Jun-20 0 2
3 14-Jan-20 30-Jan-20 0 2

 

The code that generates the 1st table of output:

data have;
format exam_dt date9.;
input id exam_dt :date9. exam_cnt birad_score_radi;
infile datalines missover;
datalines;
1 26Nov18 1 0
1 01Mar19 2 4
1 24Jun20 3 .
2 14Jan20 1 0
2 07Feb20 2 .
2 27Feb20 3 .
2 08Jun20 4 2
2 11Aug21 5 2
3 03Dec19 1 0
3 14Jan20 2 .
3 30Jan20 3 2
;
run;


data want;
*format id id_1 id_last exam_dt;
set have;
by id exam_dt;

*+++++++++++++++++++++++++
* Assessment of 0 scores 
*+++++++++++++++++++++++++;
* Scenario 1:
--> Initial screening BIRAD_SCORE_RADI = 0
^ 1st FU will be the FU date
^	If 1st FU includes the changed BIRAD score, note the date (FU_DT)
^ If 1st FU <> have changed BIRAD score then find the next record & note date (BIRAD_DT)
;
 
retain birad1 birad2 fu_dt birad_dt;
format fu_dt birad_dt date9.;

* TYPE:
1: initial score = 0 w/ FU
2: amended score from 0 w/o FU
3: initial score = 0 w/o FU
;

/* Reset retained vars to missing when onto next patient */
if first.id then do;
	birad1 = .;
	birad2 = .;
	fu_dt = .;
	birad_dt = .;
end;

/* Identify & output when initial BIRAD = 0 */
if first.id & birad_score_radi = 0 then do;
	birad1 = birad_score_radi;
end;
else if not first.id & birad1 ^= . then do;
	if birad_score_radi ^= . then do;
		birad2 = birad_score_radi;
		birad_dt = exam_dt;
		fu_dt = exam_dt;
		outp = 1;
	end;

	if birad_score_radi = . then do;
		if exam_cnt = 2 then fu_dt = exam_dt;
	end;
	else if birad_score_radi ^= . then do;
		birad2 = birad_score_radi;
		birad_dt = exam_dt;
		outp = 1;
	end;
end;
/*
else if birad1 = 0 then do;
if birad_score_radi ^= . then do;
	fu_dt = exam_dt;
	birad_dt = exam_dt;
	birad2 = birad_score_radi;
	outp = 1;
end;
end;
*/
run;

I'm open to any suggestions or alternative ways to do this. Kind regards, Brian.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Do you have any values where the birad_score_radi variable is missing on the first record for an ID. If so you need to provide that as an example in the data.

 

I also think that you need to expand on what this means " need to identify the date (FU_DT) from a patient's next record". From the shown "want" data it is always the date on the second record. If this is not the actual rule then you need to provide a different example data set and some description of what you mean in more detail.

 

For your given example data this creates the desired output:

data want;
   set have;
   by id;
   retain birad1 idcount fu_dt;
   idcount+1;
   if first.id then do;
      birad1= birad_score_radi;
      idcount=1;      
   end;
   if idcount=2 then fu_dt=exam_dt;
   if idcount>1 and (0 le birad1 < birad_score_radi)  then do;
      birad2= birad_score_radi;
      birad_dt = exam_dt;
      output;
      birad1= .;
   end;
   format fu_dt birad_dt date9. ;
   keep id fu_dt birad_dt birad1 birad2;
run;

@BrianB4233 wrote:

Hi all,

Once again I've had great difficulty getting the logic correct in my head in regards to implementing, and correctly using, the RETAIN statement. I have a dataset where I need to identify the date (FU_DT) from a patient's next record and the changed score (BIRAD2). A majority of the time these will occur on the 2nd record but, at times, the score change will occur more downstream than the 2nd record. And this is where I've admitted failure. And I only need to process these records until I've identified the changed score, i.e., all other subsequent rows can be ignored.

 

What I'm getting with my current code:

exam_dt id exam_cnt birad_score_radi birad1 birad2 fu_dt birad_dt outp
26-Nov-18 1 1 0 0 . . . .
1-Mar-19 1 2 4 0 4 1-Mar-19 1-Mar-19 1
24-Jun-20 1 3 . 0 4 1-Mar-19 1-Mar-19 .
14-Jan-20 2 1 0 0 . . . .
7-Feb-20 2 2 . 0 . 7-Feb-20 . .
27-Feb-20 2 3 . 0 . 7-Feb-20 . .
8-Jun-20 2 4 2 0 2 8-Jun-20 8-Jun-20 1
11-Aug-21 2 5 2 0 2 11-Aug-21 11-Aug-21 1
3-Dec-19 3 1 0 0 . . . .
14-Jan-20 3 2 . 0 . 14-Jan-20 . .
30-Jan-20 3 3 2 0 2 30-Jan-20 30-Jan-20 1

 

And here's what I ideally would like:

id fu_dt birad_dt birad1 birad2
1 1-Mar-19 1-Mar-19 0 4
2 7-Feb-20 8-Jun-20 0 2
3 14-Jan-20 30-Jan-20 0 2

 

The code that generates the 1st table of output:

data have;
format exam_dt date9.;
input id exam_dt :date9. exam_cnt birad_score_radi;
infile datalines missover;
datalines;
1 26Nov18 1 0
1 01Mar19 2 4
1 24Jun20 3 .
2 14Jan20 1 0
2 07Feb20 2 .
2 27Feb20 3 .
2 08Jun20 4 2
2 11Aug21 5 2
3 03Dec19 1 0
3 14Jan20 2 .
3 30Jan20 3 2
;
run;


data want;
*format id id_1 id_last exam_dt;
set have;
by id exam_dt;

*+++++++++++++++++++++++++
* Assessment of 0 scores 
*+++++++++++++++++++++++++;
* Scenario 1:
--> Initial screening BIRAD_SCORE_RADI = 0
^ 1st FU will be the FU date
^	If 1st FU includes the changed BIRAD score, note the date (FU_DT)
^ If 1st FU <> have changed BIRAD score then find the next record & note date (BIRAD_DT)
;
 
retain birad1 birad2 fu_dt birad_dt;
format fu_dt birad_dt date9.;

* TYPE:
1: initial score = 0 w/ FU
2: amended score from 0 w/o FU
3: initial score = 0 w/o FU
;

/* Reset retained vars to missing when onto next patient */
if first.id then do;
	birad1 = .;
	birad2 = .;
	fu_dt = .;
	birad_dt = .;
end;

/* Identify & output when initial BIRAD = 0 */
if first.id & birad_score_radi = 0 then do;
	birad1 = birad_score_radi;
end;
else if not first.id & birad1 ^= . then do;
	if birad_score_radi ^= . then do;
		birad2 = birad_score_radi;
		birad_dt = exam_dt;
		fu_dt = exam_dt;
		outp = 1;
	end;

	if birad_score_radi = . then do;
		if exam_cnt = 2 then fu_dt = exam_dt;
	end;
	else if birad_score_radi ^= . then do;
		birad2 = birad_score_radi;
		birad_dt = exam_dt;
		outp = 1;
	end;
end;
/*
else if birad1 = 0 then do;
if birad_score_radi ^= . then do;
	fu_dt = exam_dt;
	birad_dt = exam_dt;
	birad2 = birad_score_radi;
	outp = 1;
end;
end;
*/
run;

I'm open to any suggestions or alternative ways to do this. Kind regards, Brian.


 

View solution in original post

2 REPLIES 2
ballardw
Super User

Do you have any values where the birad_score_radi variable is missing on the first record for an ID. If so you need to provide that as an example in the data.

 

I also think that you need to expand on what this means " need to identify the date (FU_DT) from a patient's next record". From the shown "want" data it is always the date on the second record. If this is not the actual rule then you need to provide a different example data set and some description of what you mean in more detail.

 

For your given example data this creates the desired output:

data want;
   set have;
   by id;
   retain birad1 idcount fu_dt;
   idcount+1;
   if first.id then do;
      birad1= birad_score_radi;
      idcount=1;      
   end;
   if idcount=2 then fu_dt=exam_dt;
   if idcount>1 and (0 le birad1 < birad_score_radi)  then do;
      birad2= birad_score_radi;
      birad_dt = exam_dt;
      output;
      birad1= .;
   end;
   format fu_dt birad_dt date9. ;
   keep id fu_dt birad_dt birad1 birad2;
run;

@BrianB4233 wrote:

Hi all,

Once again I've had great difficulty getting the logic correct in my head in regards to implementing, and correctly using, the RETAIN statement. I have a dataset where I need to identify the date (FU_DT) from a patient's next record and the changed score (BIRAD2). A majority of the time these will occur on the 2nd record but, at times, the score change will occur more downstream than the 2nd record. And this is where I've admitted failure. And I only need to process these records until I've identified the changed score, i.e., all other subsequent rows can be ignored.

 

What I'm getting with my current code:

exam_dt id exam_cnt birad_score_radi birad1 birad2 fu_dt birad_dt outp
26-Nov-18 1 1 0 0 . . . .
1-Mar-19 1 2 4 0 4 1-Mar-19 1-Mar-19 1
24-Jun-20 1 3 . 0 4 1-Mar-19 1-Mar-19 .
14-Jan-20 2 1 0 0 . . . .
7-Feb-20 2 2 . 0 . 7-Feb-20 . .
27-Feb-20 2 3 . 0 . 7-Feb-20 . .
8-Jun-20 2 4 2 0 2 8-Jun-20 8-Jun-20 1
11-Aug-21 2 5 2 0 2 11-Aug-21 11-Aug-21 1
3-Dec-19 3 1 0 0 . . . .
14-Jan-20 3 2 . 0 . 14-Jan-20 . .
30-Jan-20 3 3 2 0 2 30-Jan-20 30-Jan-20 1

 

And here's what I ideally would like:

id fu_dt birad_dt birad1 birad2
1 1-Mar-19 1-Mar-19 0 4
2 7-Feb-20 8-Jun-20 0 2
3 14-Jan-20 30-Jan-20 0 2

 

The code that generates the 1st table of output:

data have;
format exam_dt date9.;
input id exam_dt :date9. exam_cnt birad_score_radi;
infile datalines missover;
datalines;
1 26Nov18 1 0
1 01Mar19 2 4
1 24Jun20 3 .
2 14Jan20 1 0
2 07Feb20 2 .
2 27Feb20 3 .
2 08Jun20 4 2
2 11Aug21 5 2
3 03Dec19 1 0
3 14Jan20 2 .
3 30Jan20 3 2
;
run;


data want;
*format id id_1 id_last exam_dt;
set have;
by id exam_dt;

*+++++++++++++++++++++++++
* Assessment of 0 scores 
*+++++++++++++++++++++++++;
* Scenario 1:
--> Initial screening BIRAD_SCORE_RADI = 0
^ 1st FU will be the FU date
^	If 1st FU includes the changed BIRAD score, note the date (FU_DT)
^ If 1st FU <> have changed BIRAD score then find the next record & note date (BIRAD_DT)
;
 
retain birad1 birad2 fu_dt birad_dt;
format fu_dt birad_dt date9.;

* TYPE:
1: initial score = 0 w/ FU
2: amended score from 0 w/o FU
3: initial score = 0 w/o FU
;

/* Reset retained vars to missing when onto next patient */
if first.id then do;
	birad1 = .;
	birad2 = .;
	fu_dt = .;
	birad_dt = .;
end;

/* Identify & output when initial BIRAD = 0 */
if first.id & birad_score_radi = 0 then do;
	birad1 = birad_score_radi;
end;
else if not first.id & birad1 ^= . then do;
	if birad_score_radi ^= . then do;
		birad2 = birad_score_radi;
		birad_dt = exam_dt;
		fu_dt = exam_dt;
		outp = 1;
	end;

	if birad_score_radi = . then do;
		if exam_cnt = 2 then fu_dt = exam_dt;
	end;
	else if birad_score_radi ^= . then do;
		birad2 = birad_score_radi;
		birad_dt = exam_dt;
		outp = 1;
	end;
end;
/*
else if birad1 = 0 then do;
if birad_score_radi ^= . then do;
	fu_dt = exam_dt;
	birad_dt = exam_dt;
	birad2 = birad_score_radi;
	outp = 1;
end;
end;
*/
run;

I'm open to any suggestions or alternative ways to do this. Kind regards, Brian.


 

BrianB4233
Obsidian | Level 7
Thank you ballardw, much more efficient and straight-forward than where I was going. And yes, the initial record will always be equal to '0' and the 2nd record - if it exists - will always be considered the follow-up. It took countless hours and research to get to this point from the raw datasets but this was huge and important piece.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 352 views
  • 1 like
  • 2 in conversation