Hello
I have a dataset with patients that have different therapies. In the therapies there are multiple medications. It all is in a long format.
It is in this format:
data your_dataset;
input PatientID TherapyID Drug $;
datalines;
1 1 DrugA
1 1 DrugB
1 1 DrugC
1 2 DrugA
1 2 DrugB
1 2 DrugC
1 3 DrugB
1 3 DrugC
1 4 DrugA
1 4 DrugC
1 4 DrugB
1 4 DrugD
2 5 DrugA
2 5 DrugD
2 7 DrugA
2 7 DrugC
3 6 DrugB
3 6 DrugE
;
I would like to identify, which drugs are newly started or finished within a patient, when they have a new therapy. And flag them accordingly.
Solution example:
1 1 DrugA baseline
1 1 DrugB baseline
1 1 DrugC baseline
1 2 DrugA last_time
1 2 DrugB no_change
1 2 DrugC no_change
1 3 DrugB no_change
1 3 DrugC no_change
1 4 DrugA restart
1 4 DrugC no_change
1 4 DrugB no_change
1 4 DrugD new_start
2 5 DrugA baseline
2 5 DrugD last_time
2 7 DrugA no_change
2 7 DrugC new_start
3 6 DrugB baseline
3 6 DrugE baseline
They could also have variables added with a binary value to it.
So far, I have used this code, but there are multiple errors in it.
data drug_changes;
set your_dataset;
by PatientID;
retain PrevPatientID PrevTherapyID FirstTherapyDrugs;
if first.PatientID then do;
PrevPatientID = .;
PrevTherapyID = .;
FirstTherapyDrugs = '';
end;
if PatientID ne PrevPatientID then do;
PrevTherapyID = .;
FirstTherapyDrugs = '';
end;
if TherapyID ne PrevTherapyID then do;
if Drug ne '' then do;
if PrevTherapyID ne . then do;
action = 'Finished';
output;
end;
if TherapyID = 1 then do; /* Check if it's the first therapy */
FirstTherapyDrugs = catx('|', FirstTherapyDrugs, Drug); /* Concatenate drugs */
action = 'First Start';
output;
end;
else do;
action = 'Started';
output;
end;
end;
end;
PrevPatientID = PatientID;
PrevTherapyID = TherapyID;
if last.PatientID then do;
if FirstTherapyDrugs ne '' then do;
do i = 1 to countw(FirstTherapyDrugs, '|');
Drug = scan(FirstTherapyDrugs, i, '|');
action = 'First Start';
output;
end;
end;
if Drug ne '' then do;
if TherapyID = 1 then do; /* Check if it's the first therapy */
FirstTherapyDrugs = catx('|', FirstTherapyDrugs, Drug); /* Concatenate drugs */
action = 'First Start';
output;
end;
else do;
action = 'Started';
output;
end;
end;
end;
drop PrevPatientID PrevTherapyID FirstTherapyDrugs i;
run;
proc print data=drug_changes;
run;
Thank you so much for your help in advance.
Nina
In terms of data what is "new"? How do we recognize that? And the same for "finish"? Rules are a good idea, otherwise we are guessing and that's seldom a good idea.
Question about "last time". If it is "restarted" is it really a "last"?
What if a drug appears only once? Will you establish a new status (say 'ONE TIME')?
And what if that one time is for therapyid=1 (which I presume is how you identify status 'BASELINE')?
Great questions!
These were my rules:
Baseline: drugs of the therapy 1 of a patient
last_time: if a drug is stopped afterwards
restart: a drug that has been taken before, but that had a break
new_start: start of a drug, that has never been used before by that patient
no_change: nothing changed to the therapy before
But you are right:
add
one_time: if a drug appears only once
paused: if a drug has a break and is then restarted again
I hope, this helps.
@Nina4 wrote:
Great questions!
These were my rules:
Baseline: drugs of the therapy 1 of a patient
last_time: if a drug is stopped afterwards
restart: a drug that has been taken before, but that had a break
new_start: start of a drug, that has never been used before by that patient
no_change: nothing changed to the therapy before
But you are right:
add
one_time: if a drug appears only once
paused: if a drug has a break and is then restarted again
I hope, this helps.
Yes, it helps but
Why is 1/4/druga characterized as "last time" instead of "paused", given there is a subsequent 1/4/druga?
What if the "one time" is at baseline? Which of those two status values do you want?
You are right, with the new value paused and one_time the dataset should like like the following:
1 1 DrugA baseline
1 1 DrugB baseline
1 1 DrugC baseline
1 2 DrugA paused
1 2 DrugB no_change
1 2 DrugC no_change
1 3 DrugB no_change
1 3 DrugC no_change
1 4 DrugA restart
1 4 DrugC no_change
1 4 DrugB no_change
1 4 DrugD new_start
2 5 DrugA baseline
2 5 DrugD last_time
2 7 DrugA no_change
2 7 DrugC new_start
3 6 DrugB baseline
3 6 DrugE baseline
If one_time is at baseline, there should be one_time and also all other values should overrule baseline.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.