Hello Community,
I have a data cleaning task that I would greatly appreciate some help with. I am working with some health care data and I would like to identify patients in my dataset who have any diagnosis from a set of diagnoses (A, B, C) within 1 year of an index date (that corresponds to when a certain medication began). The patient can have any one of the diagnoses, A, B, or C, within one year of the index date to be counted (e.g., variable 'flag'=1). I have provided an example dataset below as well as my desired output. Please let me know if there are any questions. Any help would be much appreciated!
Have:
PatientID | Diagnosis | Diagnosis_date | Index_date |
|
1 | A | 01JUN2015 | 01JUN2015 | *Condition is true because all diagnoses are within 12 months (before) index date |
1 | B | 01AUG2015 | 01JUN2015 | |
1 | C | 01OCT2015 | 01JUN2015 | |
2 | B | 15FEB2016 | 01DEC2015 | *Condition is true because diagnosis B is within 12 months (after) index date |
3 | A | 12DEC2015 | 20JAN2017 | *Condition is true because diagnosis B is within 1 year of index date (even though diagnosis A is not) |
3 | B | 25OCT2016 | 20JAN2017 | |
4 | B | 01MAR2018 | 01NOV2019 | *Condition NOT true because no diagnoses are within 1 year of index date |
4 | B | 15MAR2018 | 01NOV2019 | |
4 | C | 30MAR2018 | 01NOV2019 | |
5 | A | 11FEB2012 | 07JUN2010 | *Condition NOT true because diagnosis is not within 1 year of index date |
5 | A | 01MAR2012 | 07JUN2010 |
Desired output:
PatientID | Flag |
1 | 1 |
2 | 1 |
3 | 1 |
4 | 0 |
5 | 0 |
UNTESTED CODE
data need;
set have;
flag=(-1<=intck('year',index_date,diagnosis_date,'c')<=0);
run;
proc summary nway data=need;
class patientID;
var flag;
output out=want max=;
run;
If you want tested code, please provide data according to these instructions.
UNTESTED CODE
data need;
set have;
flag=(-1<=intck('year',index_date,diagnosis_date,'c')<=0);
run;
proc summary nway data=need;
class patientID;
var flag;
output out=want max=;
run;
If you want tested code, please provide data according to these instructions.
I stole a line from @PaigeMiller 🙂
data have;
input PatientID Diagnosis $ (Diagnosis_date Index_date)(:date9.);
format Diagnosis_date Index_date date9.;
datalines;
1 A 01JUN2015 01JUN2015
1 B 01AUG2015 01JUN2015
1 C 01OCT2015 01JUN2015
2 B 15FEB2016 01DEC2015
3 A 12DEC2015 20JAN2017
3 B 25OCT2016 20JAN2017
4 B 01MAR2018 01NOV2019
4 B 15MAR2018 01NOV2019
4 C 30MAR2018 01NOV2019
5 A 11FEB2012 07JUN2010
5 A 01MAR2012 07JUN2010
;
data want(keep = PatientID flag);
do until (last.PatientID);
set have;
by PatientID;
flag = (-1 <= intck('year', index_date, diagnosis_date, 'c') <= 0);
end;
run;
Result:
PatientID flag 1 1 2 1 3 1 4 0 5 0
@PeterClemmensen Thank you very much! However, how would you modify your code to handle a case like for PatientID=6 below, where the first diagnosis date is within one year but the second one is not? In this case, it should still be flag=1 for PatientID=6 because I would like the condition to be true as long as there is at least one diagnosis date within one year of the index date. Any help would be much appreciated!
data have;
input PatientID Diagnosis $ (Diagnosis_date Index_date)(:date9.);
format Diagnosis_date Index_date date9.;
datalines;
1 A 01JUN2015 01JUN2015
1 B 01AUG2015 01JUN2015
1 C 01OCT2015 01JUN2015
2 B 15FEB2016 01DEC2015
3 A 12DEC2015 20JAN2017
3 B 25OCT2016 20JAN2017
4 B 01MAR2018 01NOV2019
4 B 15MAR2018 01NOV2019
4 C 30MAR2018 01NOV2019
5 A 11FEB2012 07JUN2010
5 A 01MAR2012 07JUN2010
6 C 01JAN2017 05JAN2017
6 C 01FEB2018 05JAN2017
;
I believe the code I provided handles patient ID 6 properly.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.