BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wj2
Quartz | Level 8 wj2
Quartz | Level 8

Hello community, 

 

I am working with a large electronic health record data set (~1 million patients) and I need to determine the number of patients that have ANY diagnosis in a group of diagnoses (e.g., D, E, or F) within 365 days AFTER the date of ANY diagnosis within a different group of diagnoses (e.g., A, B, or C). 

 

In other words: diagnosis date of diagnosis D or E or F - diagnosis date of A or B or C = le 365 days. The condition can be met for any pair of diagnoses between the two groups (e.g., D-A, D-B, D-C, E-A, etc.). 

 

Below is an example data set with some different cases for when the criterion should and should not be met. patient id = pid; diagnosis code= diag_cd; diagnosis date=diag_dt.

data work.test;
length pid $1.;
length diag $1.;
length diag_dt 8.; format diag_dt yymmdd10.;
infile datalines ;
input pid $ diag $ diag_dt date11.;
datalines;
1 A 10JAN2013
1 D 10JAN2013 /*condition met because diagnosis D occurs on same date as A*/
1 G 10JAN2013
2 F 02JUN2017 
2 J 02JUN2017
2 B 01JUL2017 /*condition NOT met because diagnosis F occurs before B*/
2 J 01JUL2017
3 D 23OCT2014
3 C 17DEC2014
3 X 11JAN2015
3 D 25FEB2015 /*condition met because D occurs within 365 days after C*/
4 A 12MAR2015
4 Z 12MAR2015
4 A 20FEB2016 
4 E 07JAN2016 /*condition met because E occurs within 365 days after A on 12MAR2015*/
5 C 16JAN2017
5 A 28JAN2017
5 F 01FEB2018 /*condition NOT met because F occurs after 365 days from C and A*/
6 B 13NOV2016
6 B 10DEC2016
6 D 10DEC2016 /*condition met because D occurs within 365 days of B on 13NOV2016 or 10DEC2016*/
6 B 12DEC2017
7 A 01JAN2013
7 A 08JAN2014
7 D 07MAY2014 /*condition met because D occurs within 365 days of A on 08JAN2014 (but not A on 01JAN2013*/
7 A 15JUN2015
;
run;

Ultimately, I would like the output data set to have a single row per each patient with a new variable (e.g., newvar) indicating whether the condition is met (diagnosis date of D or E or F - diagnosis date of A or B or C = le 365 days). Below is an example based on the example data set above. 

 

pid

newvar

1

1

2

0

3

1

4

1

5

1

6

0

7

1

 

Any help with this problem would be much appreciated. I'm assuming it would require some combination of proc sql and a data step with first/last variables; however, my programming skills are not yet advanced enough. Please let me know if I can provide any additional clarification or information. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Should be

5

0

6

1

?

 

How about this one ?

 

data work.test;
length pid $1.;
length diag $1.;
length diag_dt 8.; format diag_dt yymmdd10.;
infile datalines ;
input pid $ diag $ diag_dt date11.;
format diag_dt date11.;
datalines;
1 A 10JAN2013
1 D 10JAN2013 /*condition met because diagnosis D occurs on same date as A*/
1 G 10JAN2013
2 F 02JUN2017 
2 J 02JUN2017
2 B 01JUL2017 /*condition NOT met because diagnosis F occurs before B*/
2 J 01JUL2017
3 D 23OCT2014
3 C 17DEC2014
3 X 11JAN2015
3 D 25FEB2015 /*condition met because D occurs within 365 days after C*/
4 A 12MAR2015
4 Z 12MAR2015
4 A 20FEB2016 
4 E 07JAN2016 /*condition met because E occurs within 365 days after A on 12MAR2015*/
5 C 16JAN2017
5 A 28JAN2017
5 F 01FEB2018 /*condition NOT met because F occurs after 365 days from C and A*/
6 B 13NOV2016
6 B 10DEC2016
6 D 10DEC2016 /*condition met because D occurs within 365 days of B on 13NOV2016 or 10DEC2016*/
6 B 12DEC2017
7 A 01JAN2013
7 A 08JAN2014
7 D 07MAY2014 /*condition met because D occurs within 365 days of A on 08JAN2014 (but not A on 01JAN2013*/
7 A 15JUN2015
;
run;
proc sort data=test out=have;
by pid diag_dt;
run;
data want;
 array _c{99} $ 32 _temporary_;
 array _n{99} _temporary_;
 do i=1 by 1 until(last.pid);
  set have;
  by pid;
  _c{i}=diag;
  _n{i}=diag_dt;
 end;
new_var=0;
 do m=1 to i-1;
   do n=m+1 to i;
    if _c{n} in ('D' 'E' 'F') and _c{m} in ('A' 'B' 'C') then do;
     dif_day=_n{n}-_n{m};
	 if dif_day le 365  then do;new_var=1;yes=1;leave;end;
	end;
	if yes then leave;
   end;
 end;
 keep pid new_var;
 run;

View solution in original post

2 REPLIES 2
Ksharp
Super User

Should be

5

0

6

1

?

 

How about this one ?

 

data work.test;
length pid $1.;
length diag $1.;
length diag_dt 8.; format diag_dt yymmdd10.;
infile datalines ;
input pid $ diag $ diag_dt date11.;
format diag_dt date11.;
datalines;
1 A 10JAN2013
1 D 10JAN2013 /*condition met because diagnosis D occurs on same date as A*/
1 G 10JAN2013
2 F 02JUN2017 
2 J 02JUN2017
2 B 01JUL2017 /*condition NOT met because diagnosis F occurs before B*/
2 J 01JUL2017
3 D 23OCT2014
3 C 17DEC2014
3 X 11JAN2015
3 D 25FEB2015 /*condition met because D occurs within 365 days after C*/
4 A 12MAR2015
4 Z 12MAR2015
4 A 20FEB2016 
4 E 07JAN2016 /*condition met because E occurs within 365 days after A on 12MAR2015*/
5 C 16JAN2017
5 A 28JAN2017
5 F 01FEB2018 /*condition NOT met because F occurs after 365 days from C and A*/
6 B 13NOV2016
6 B 10DEC2016
6 D 10DEC2016 /*condition met because D occurs within 365 days of B on 13NOV2016 or 10DEC2016*/
6 B 12DEC2017
7 A 01JAN2013
7 A 08JAN2014
7 D 07MAY2014 /*condition met because D occurs within 365 days of A on 08JAN2014 (but not A on 01JAN2013*/
7 A 15JUN2015
;
run;
proc sort data=test out=have;
by pid diag_dt;
run;
data want;
 array _c{99} $ 32 _temporary_;
 array _n{99} _temporary_;
 do i=1 by 1 until(last.pid);
  set have;
  by pid;
  _c{i}=diag;
  _n{i}=diag_dt;
 end;
new_var=0;
 do m=1 to i-1;
   do n=m+1 to i;
    if _c{n} in ('D' 'E' 'F') and _c{m} in ('A' 'B' 'C') then do;
     dif_day=_n{n}-_n{m};
	 if dif_day le 365  then do;new_var=1;yes=1;leave;end;
	end;
	if yes then leave;
   end;
 end;
 keep pid new_var;
 run;
wj2
Quartz | Level 8 wj2
Quartz | Level 8

@Ksharp, this is perfect! Thank you!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 744 views
  • 1 like
  • 2 in conversation