Hello
I have a dataset with patients that have different therapies. In the therapies there are multiple medications. It all is in a long format.
It is in this format:
data your_dataset;
input PatientID TherapyID Drug $;
datalines;
1 1 DrugA
1 1 DrugB
1 1 DrugC
1 2 DrugA
1 2 DrugB
1 2 DrugC
1 3 DrugB
1 3 DrugC
1 4 DrugA
1 4 DrugC
1 4 DrugB
1 4 DrugD
2 5 DrugA
2 5 DrugD
2 7 DrugA
2 7 DrugC
3 6 DrugB
3 6 DrugE
;
I would like to identify, which drugs are newly started or finished within a patient, when they have a new therapy. And flag them accordingly.
Solution example:
1 1 DrugA baseline
1 1 DrugB baseline
1 1 DrugC baseline
1 2 DrugA last_time
1 2 DrugB no_change
1 2 DrugC no_change
1 3 DrugB no_change
1 3 DrugC no_change
1 4 DrugA restart
1 4 DrugC no_change
1 4 DrugB no_change
1 4 DrugD new_start
2 5 DrugA baseline
2 5 DrugD last_time
2 7 DrugA no_change
2 7 DrugC new_start
3 6 DrugB baseline
3 6 DrugE baseline
They could also have variables added with a binary value to it.
So far, I have used this code, but there are multiple errors in it.
data drug_changes;
set your_dataset;
by PatientID;
retain PrevPatientID PrevTherapyID FirstTherapyDrugs;
if first.PatientID then do;
PrevPatientID = .;
PrevTherapyID = .;
FirstTherapyDrugs = '';
end;
if PatientID ne PrevPatientID then do;
PrevTherapyID = .;
FirstTherapyDrugs = '';
end;
if TherapyID ne PrevTherapyID then do;
if Drug ne '' then do;
if PrevTherapyID ne . then do;
action = 'Finished';
output;
end;
if TherapyID = 1 then do; /* Check if it's the first therapy */
FirstTherapyDrugs = catx('|', FirstTherapyDrugs, Drug); /* Concatenate drugs */
action = 'First Start';
output;
end;
else do;
action = 'Started';
output;
end;
end;
end;
PrevPatientID = PatientID;
PrevTherapyID = TherapyID;
if last.PatientID then do;
if FirstTherapyDrugs ne '' then do;
do i = 1 to countw(FirstTherapyDrugs, '|');
Drug = scan(FirstTherapyDrugs, i, '|');
action = 'First Start';
output;
end;
end;
if Drug ne '' then do;
if TherapyID = 1 then do; /* Check if it's the first therapy */
FirstTherapyDrugs = catx('|', FirstTherapyDrugs, Drug); /* Concatenate drugs */
action = 'First Start';
output;
end;
else do;
action = 'Started';
output;
end;
end;
end;
drop PrevPatientID PrevTherapyID FirstTherapyDrugs i;
run;
proc print data=drug_changes;
run;
Thank you so much for your help in advance.
Nina
In terms of data what is "new"? How do we recognize that? And the same for "finish"? Rules are a good idea, otherwise we are guessing and that's seldom a good idea.
Question about "last time". If it is "restarted" is it really a "last"?
What if a drug appears only once? Will you establish a new status (say 'ONE TIME')?
And what if that one time is for therapyid=1 (which I presume is how you identify status 'BASELINE')?
Great questions!
These were my rules:
Baseline: drugs of the therapy 1 of a patient
last_time: if a drug is stopped afterwards
restart: a drug that has been taken before, but that had a break
new_start: start of a drug, that has never been used before by that patient
no_change: nothing changed to the therapy before
But you are right:
add
one_time: if a drug appears only once
paused: if a drug has a break and is then restarted again
I hope, this helps.
@Nina4 wrote:
Great questions!
These were my rules:
Baseline: drugs of the therapy 1 of a patient
last_time: if a drug is stopped afterwards
restart: a drug that has been taken before, but that had a break
new_start: start of a drug, that has never been used before by that patient
no_change: nothing changed to the therapy before
But you are right:
add
one_time: if a drug appears only once
paused: if a drug has a break and is then restarted again
I hope, this helps.
Yes, it helps but
Why is 1/4/druga characterized as "last time" instead of "paused", given there is a subsequent 1/4/druga?
What if the "one time" is at baseline? Which of those two status values do you want?
You are right, with the new value paused and one_time the dataset should like like the following:
1 1 DrugA baseline
1 1 DrugB baseline
1 1 DrugC baseline
1 2 DrugA paused
1 2 DrugB no_change
1 2 DrugC no_change
1 3 DrugB no_change
1 3 DrugC no_change
1 4 DrugA restart
1 4 DrugC no_change
1 4 DrugB no_change
1 4 DrugD new_start
2 5 DrugA baseline
2 5 DrugD last_time
2 7 DrugA no_change
2 7 DrugC new_start
3 6 DrugB baseline
3 6 DrugE baseline
If one_time is at baseline, there should be one_time and also all other values should overrule baseline.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.