hello! I am trying to write a loop to flag two criteria. First if the difference between lag(timeperiod) - timeperiod >=4 and then to compare if diag codes between time periods is the same of different. I want to run the loop over each patientID until one of these criteria's are met.
TIA!
E.g. data
patientID, timeperiod, combined
wh7,1, diag1_diag2_diag3
wh7, 4, diag1_diag2_diag3_diag4
wh7, 10, diag1_diag2
wh7 15, diag4_diag10
wh4, 2, diag5_diag11_diag16
wh4, 4, diag5_diag11
I'm assuming the data shown is input.
Can you show what you've tried so far?
Can you show what you're looking for as output if this is the data input?
@nyy2 wrote:
hello! I am trying to write a loop to flag two criteria. First if the difference between lag(timeperiod) - timeperiod >=4 and then to compare if diag codes between time periods is the same of different. I want to run the loop over each patientID until one of these criteria's are met.
TIA!
E.g. data
patientID, timeperiod, combined
wh7,1, diag1_diag2_diag3
wh7, 4, diag1_diag2_diag3_diag4
wh7, 10, diag1_diag2
wh7 15, diag4_diag10
wh4, 2, diag5_diag11_diag16
wh4, 4, diag5_diag11
Yes this is example of input. For output, I would just want to add a column called running_total that would start at 1 and only increase when either time period triggered an increase of 1 or the different diag codes.
Sorry, I think I'd need more details. Can you provide a fully worked example that shows the input and expected output (as data sets)?
For example, it's not clear how this would be defined:
"then to compare if diag codes between time periods is the same of different."
SAS data steps loop automatically so you don't need to really factor that in. For the daily difference that's easy.
EDIT: to add running total instead of a counter
data want;
set have;
by PatientID;
period_dif = dif(TimePeriod);
if first.patientID then counter=1;
else if period_dif>=4 then counter+1;
*assuming you do not want this in final data set;
*drop period_dif;
run;
I was trying to first lag the time period and then if that does not increase the running total then trying to use the scan function to look if the substring for each word in the combined column matches the lag combined variable.
sorry it's quite complicated
@nyy2 wrote:
sorry it's quite complicated
Ergo the request for an example of the output. Data/examples really help 🙂
EDIT: I've edited my previous response to essentially add the case counter. Not sure about the diagnosis stuff yet, please show an example. Also, did you happen to create this variable or is it stored that way initially? How many possible diagnosis are possible?
Thanks, I added an example with some dummy data and notes of how I was thinking about it. List of codes in the combined does not really have a limit as I'm looking for any codes that occur within that time period.
patientID | time_period | combined | running_total | Note | Questions |
wh7 | 1 | C400_C401_C408 | 1 | First record | |
wh7 | 4 | C400_C401_C408_C409 | 2 | Flag because previous row diagnosis codes are different | |
wh7 | 10 | C400_C401_C408_C409 | 2 | Total stays 2 because one of 2 conditions has been met, could drop out of loop here | |
wh7 | 15 | C400_C401_C408 | 2 | Total stays 2 because one of 2 conditions has been met, could drop out of loop here | |
wh4 | 1 | C030_C039_C031 | 1 | First record for this patient | |
wh4 | 2 | C030_C039 | 1 | Flag stays 1 because diag only dropped not added and no time break >4 | |
wh4 | 5 | C030_C049 | 2 | Increase by one because diag changed from previous row | |
wh5 | 1 | F107_F108 | 1 | First record | |
wh5 | 3 | F107_F109 | 2 | Flag because previous row diagnosis codes are different | |
wh5 | 5 | F107_F109 | 2 | Total stays 2 because codes are same and condition has already been met, could drop out of loop here | |
wh5 | 7 | F107_F108_F109 | 2 | Total stays 2 because already met one of the conditions |
Attachments are a pain, please post directly into the forum. FYI - you cannot drop out of a loop in a data step, it loops through all rows.
okay thanks for letting me know. first time using the forum.
Not 100% clear on your criteria as some things don't make sense to me but this shows how to check for the difference in time and a new diagnosis.
You should be able to create your variable as desired with this information.
data have;
infile cards dlm='|';
input patientID $ time_period combined : $40. running_total ;
cards;
wh7| 1| C400_C401_C408| 1
wh7| 4| C400_C401_C408_C409| 2
wh7| 10| C400_C401_C408_C409| 2
wh7| 15| C400_C401_C408| 2
wh4| 1| C030_C039_C031| 1
wh4| 2| C030_C039| 1
wh4| 5| C030_C049| 2
wh5| 1| F107_F108| 1
wh5| 3| F107_F109| 2
wh5| 5| F107_F109| 2
wh5| 7| F107_F108_F109|2
;;;;
run;
data want;
set have (rename=running_total = total_check);
by patientID notsorted;
retain diag_list;
if first.patientID then do; diag_list = combined; running_total = 1; end;
dif_time = dif(time_period);
nwords = countw(combined, '_');
*see if new diagnosis is in the list;
diag_found=0;
if not first.patientID then do i=1 to nwords;
diag = scan(combined, i, '_');
if findw(diag_list, diag, '_', 't')=0 then diag_not_found=1;
end;
if diag_not_found then do;
running_total+1;
diag_list = combined;
end;
drop i diag;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.