BookmarkSubscribeRSS Feed
nyy2
Fluorite | Level 6

hello! I am trying to write a loop to flag two criteria. First if the difference between lag(timeperiod) - timeperiod >=4 and then to compare if diag codes between time periods is the same of different. I want to run the loop over each patientID until one of these criteria's are met.  

 

TIA!

 

E.g. data 

patientID, timeperiod, combined

wh7,1, diag1_diag2_diag3

wh7, 4, diag1_diag2_diag3_diag4

wh7, 10, diag1_diag2

wh7 15, diag4_diag10

wh4, 2, diag5_diag11_diag16

wh4, 4, diag5_diag11

9 REPLIES 9
Reeza
Super User

I'm assuming the data shown is input. 

 

Can you show what you've tried so far?

Can you show what you're looking for as output if this is the data input?

 


@nyy2 wrote:

hello! I am trying to write a loop to flag two criteria. First if the difference between lag(timeperiod) - timeperiod >=4 and then to compare if diag codes between time periods is the same of different. I want to run the loop over each patientID until one of these criteria's are met.  

 

TIA!

 

E.g. data 

patientID, timeperiod, combined

wh7,1, diag1_diag2_diag3

wh7, 4, diag1_diag2_diag3_diag4

wh7, 10, diag1_diag2

wh7 15, diag4_diag10

wh4, 2, diag5_diag11_diag16

wh4, 4, diag5_diag11


 

nyy2
Fluorite | Level 6

Yes this is example of input. For output, I would just want to add a column called running_total that would start at 1 and only increase when either time period triggered an increase of 1 or the different diag codes.

Reeza
Super User

Sorry, I think I'd need more details. Can you provide a fully worked example that shows the input and expected output (as data sets)?

For example, it's not clear how this would be defined:

"then to compare if diag codes between time periods is the same of different."

 

SAS data steps loop automatically so you don't need to really factor that in. For the daily difference that's easy.

 

EDIT: to add running total instead of a counter

 

data want;
set have;
by PatientID;

period_dif = dif(TimePeriod);
if first.patientID then counter=1;
else if period_dif>=4 then counter+1;


*assuming you do not want this in final data set;
*drop period_dif;
run;

 

 

nyy2
Fluorite | Level 6

I was trying to first lag the time period and then if that does not increase the running total then trying to use the scan function to look if the substring for each word in the combined column matches the lag combined variable. 

sorry it's quite complicated

Reeza
Super User

@nyy2 wrote:



sorry it's quite complicated


Ergo the request for an example of the output. Data/examples really help 🙂

EDIT: I've edited my previous response to essentially add the case counter. Not sure about the diagnosis stuff yet, please show an example. Also, did you happen to create this variable or is it stored that way initially? How many possible diagnosis are possible?

nyy2
Fluorite | Level 6

Thanks, I added an example with some dummy data and notes of how I was thinking about it. List of codes in the combined does not really have a limit as I'm looking for any codes that occur within that time period. 

Reeza
Super User
patientID time_period combined running_total Note Questions
wh7 1 C400_C401_C408 1 First record  
wh7 4 C400_C401_C408_C409 2 Flag because previous row diagnosis codes are different  
wh7 10 C400_C401_C408_C409 2 Total stays 2 because one of 2 conditions has been met, could drop out of loop here  
wh7 15 C400_C401_C408 2 Total stays 2 because one of 2 conditions has been met, could drop out of loop here  
           
wh4 1 C030_C039_C031 1 First record for this patient  
wh4 2 C030_C039 1 Flag stays 1 because diag only dropped not added and no time break >4  
wh4 5 C030_C049 2 Increase by one because diag changed from previous row  
           
wh5 1 F107_F108 1 First record  
wh5 3 F107_F109 2 Flag because previous row diagnosis codes are different  
wh5 5 F107_F109 2 Total stays 2 because codes are same and condition has already been met, could drop out of loop here  
wh5 7 F107_F108_F109 2 Total stays 2 because already met one of the conditions  

 

Attachments are a pain, please post directly into the forum. FYI - you cannot drop out of a loop in a data step, it loops through all rows. 

nyy2
Fluorite | Level 6

okay thanks for letting me know. first time using the forum.

Reeza
Super User

Not 100% clear on your criteria as some things don't make sense to me but this shows how to check for the difference in time and a new diagnosis. 

You should be able to create your variable as desired with this information.  

 

data have;
infile cards dlm='|';
input patientID $	time_period	combined  : $40.	running_total ;
cards;	
wh7|	1|	C400_C401_C408|	1	 
wh7|	4|	C400_C401_C408_C409|	2	
wh7|	10|	C400_C401_C408_C409|	2	
wh7|	15|	C400_C401_C408|	2		 	 	 	 	 
wh4|	1|	C030_C039_C031|	1	
wh4|	2|	C030_C039|	1	 
wh4|	5|	C030_C049|	2		 	 	 	 	 
wh5|	1|	F107_F108|	1	
wh5|	3|	F107_F109|	2	
wh5|	5|	F107_F109|	2	
wh5|	7|	F107_F108_F109|2	
;;;;
run;


data want;
set have (rename=running_total = total_check);
by patientID notsorted;
retain diag_list;

if first.patientID then do; diag_list = combined; running_total = 1; end;

dif_time = dif(time_period);

nwords = countw(combined, '_');

*see if new diagnosis is in the list;
diag_found=0;
if not first.patientID then do i=1 to nwords;
diag = scan(combined, i, '_');
if  findw(diag_list, diag, '_', 't')=0 then diag_not_found=1;
end;

if diag_not_found then do;
    running_total+1;
    diag_list = combined;
end;


drop i diag;
run;

SAS Innovate 2025: Register Today!

 

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1572 views
  • 0 likes
  • 2 in conversation