Hello Community,
I am working with a health record data set that consists of a patient identifier, diagnoses for each patient, diagnosis dates, and an index date (i.e., starting date) for a prescription of interest- please see example dataset 'Have' below. I would like to create a variable (e.g., variable= flag) indicating whether each patient has ANY diagnosis within a group of diagnoses (diagnosis= A, B, or C) that has a diagnosis date ON or within 6 months BEFORE the index date. I have provided an example of the desired output dataset below. Any help with how to program this would be much appreciated!!
Have:
Patient_ID | Diagnosis | Diagnosis_date | Index_date |
|
1 | A | 01JAN2016 | 01JAN2016 | *Flag=1 because ‘A’ diagnosis date on index date |
1 | D | 01JAN2016 | 01JAN2016 | |
1 | E | 01FEB2016 | 01JAN2016 | |
1 | A | 01FEB2016 | 01JAN2016 |
|
2 | B | 15MAY2018 | 15JUN2018 | *Flag=1 because ‘B’ diagnosis date within 6 months before index date |
2 | E | 15MAY2018 | 15JUN2018 | |
2 | F | 01JUN2018 | 15JUN2018 | |
3 | A | 05MAR2014 | 20FEB2015 | *Flag=1 because ‘C’ diagnosis date within 6 months before index date although ‘A’ diagnosis is not |
3 | G | 10OCT2014 | 20FEB2015 | |
3 | C | 01FEB2015 | 20FEB2015 | |
3 | H | 01FEB2015 | 20FEB2015 | |
4 | J | 02JUL2014 | 02JUL2014 | *Flag=0 because ‘B’ diagnosis date after index date |
4 | K | 01AUG2014 | 02JUL2014 | |
4 | B | 01AUG2014 | 02JUL2014 | |
5 | W | 01NOV2009 | 01JAN2010 | *Flag=0 because no diagnoses A, B, or C within 6 months before index date |
5 | Z | 01JAN2010 | 01JAN2010 | |
5 | H | 01FEB2010 | 01JAN2010 | |
6 | A | 10MAR2011 | 15JUN2011 | *Flag=1 because there is a diagnosis ‘A’ and ‘B’ within 6 months before index date |
6 | B | 12MAY2011 | 15JUN2011 | |
6 | X | 15JUL2011 | 15JUN2011 | |
6 | A | 15JUL2011 | 15JUN2011 |
Want:
Patient_ID | Flag |
1 | 1 |
2 | 1 |
3 | 1 |
4 | 0 |
5 | 0 |
6 | 1 |
data have;
input Patient_ID Diagnosis $ (Diagnosis_date Index_date)(:date9.);
format Diagnosis_date Index_date date9.;
datalines;
1 A 01JAN2016 01JAN2016
1 D 01JAN2016 01JAN2016
1 E 01FEB2016 01JAN2016
1 A 01FEB2016 01JAN2016
2 B 15MAY2018 15JUN2018
2 E 15MAY2018 15JUN2018
2 F 01JUN2018 15JUN2018
3 A 05MAR2014 20FEB2015
3 G 10OCT2014 20FEB2015
3 C 01FEB2015 20FEB2015
3 H 01FEB2015 20FEB2015
4 J 02JUL2014 02JUL2014
4 K 01AUG2014 02JUL2014
4 B 01AUG2014 02JUL2014
5 W 01NOV2009 01JAN2010
5 Z 01JAN2010 01JAN2010
5 H 01FEB2010 01JAN2010
6 A 10MAR2011 15JUN2011
6 B 12MAY2011 15JUN2011
6 X 15JUL2011 15JUN2011
6 A 15JUL2011 15JUN2011
;
Hi, hope the below helps:
%let diagnosis = 'A','B','C';
data have;
input Patient_ID Diagnosis $ (Diagnosis_date Index_date)(:date9.);
format Diagnosis_date Index_date date9. Flag_Variable $1.;
if diagnosis in (&diagnosis.) and Diagnosis_date <= index_date and Diagnosis_date >= intnx('month',index_date,-6,'s') then
Flag_Variable = "Y";
else Flag_Variable = " ";
datalines;
1 A 01JAN2016 01JAN2016
1 D 01JAN2016 01JAN2016
1 E 01FEB2016 01JAN2016
1 A 01FEB2016 01JAN2016
2 B 15MAY2018 15JUN2018
2 E 15MAY2018 15JUN2018
2 F 01JUN2018 15JUN2018
3 A 05MAR2014 20FEB2015
3 G 10OCT2014 20FEB2015
3 C 01FEB2015 20FEB2015
3 H 01FEB2015 20FEB2015
4 J 02JUL2014 02JUL2014
4 K 01AUG2014 02JUL2014
4 B 01AUG2014 02JUL2014
5 W 01NOV2009 01JAN2010
5 Z 01JAN2010 01JAN2010
5 H 01FEB2010 01JAN2010
6 A 10MAR2011 15JUN2011
6 B 12MAY2011 15JUN2011
6 X 15JUL2011 15JUN2011
6 A 15JUL2011 15JUN2011
;
run;
Hi, hope the below helps:
%let diagnosis = 'A','B','C';
data have;
input Patient_ID Diagnosis $ (Diagnosis_date Index_date)(:date9.);
format Diagnosis_date Index_date date9. Flag_Variable $1.;
if diagnosis in (&diagnosis.) and Diagnosis_date <= index_date and Diagnosis_date >= intnx('month',index_date,-6,'s') then
Flag_Variable = "Y";
else Flag_Variable = " ";
datalines;
1 A 01JAN2016 01JAN2016
1 D 01JAN2016 01JAN2016
1 E 01FEB2016 01JAN2016
1 A 01FEB2016 01JAN2016
2 B 15MAY2018 15JUN2018
2 E 15MAY2018 15JUN2018
2 F 01JUN2018 15JUN2018
3 A 05MAR2014 20FEB2015
3 G 10OCT2014 20FEB2015
3 C 01FEB2015 20FEB2015
3 H 01FEB2015 20FEB2015
4 J 02JUL2014 02JUL2014
4 K 01AUG2014 02JUL2014
4 B 01AUG2014 02JUL2014
5 W 01NOV2009 01JAN2010
5 Z 01JAN2010 01JAN2010
5 H 01FEB2010 01JAN2010
6 A 10MAR2011 15JUN2011
6 B 12MAY2011 15JUN2011
6 X 15JUL2011 15JUN2011
6 A 15JUL2011 15JUN2011
;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.