BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wj2
Quartz | Level 8 wj2
Quartz | Level 8

Hello community, 

 

I am working with a large electronic health record data set (~1 million patients) and I need to determine the number of patients that have ANY diagnosis in a group of diagnoses (e.g., D, E, or F) within 365 days AFTER the date of ANY diagnosis within a different group of diagnoses (e.g., A, B, or C). 

 

In other words: diagnosis date of diagnosis D or E or F - diagnosis date of A or B or C = le 365 days. The condition can be met for any pair of diagnoses between the two groups (e.g., D-A, D-B, D-C, E-A, etc.). 

 

Below is an example data set with some different cases for when the criterion should and should not be met. patient id = pid; diagnosis code= diag_cd; diagnosis date=diag_dt.

data work.test;
length pid $1.;
length diag $1.;
length diag_dt 8.; format diag_dt yymmdd10.;
infile datalines ;
input pid $ diag $ diag_dt date11.;
datalines;
1 A 10JAN2013
1 D 10JAN2013 /*condition met because diagnosis D occurs on same date as A*/
1 G 10JAN2013
2 F 02JUN2017 
2 J 02JUN2017
2 B 01JUL2017 /*condition NOT met because diagnosis F occurs before B*/
2 J 01JUL2017
3 D 23OCT2014
3 C 17DEC2014
3 X 11JAN2015
3 D 25FEB2015 /*condition met because D occurs within 365 days after C*/
4 A 12MAR2015
4 Z 12MAR2015
4 A 20FEB2016 
4 E 07JAN2016 /*condition met because E occurs within 365 days after A on 12MAR2015*/
5 C 16JAN2017
5 A 28JAN2017
5 F 01FEB2018 /*condition NOT met because F occurs after 365 days from C and A*/
6 B 13NOV2016
6 B 10DEC2016
6 D 10DEC2016 /*condition met because D occurs within 365 days of B on 13NOV2016 or 10DEC2016*/
6 B 12DEC2017
7 A 01JAN2013
7 A 08JAN2014
7 D 07MAY2014 /*condition met because D occurs within 365 days of A on 08JAN2014 (but not A on 01JAN2013*/
7 A 15JUN2015
;
run;

Ultimately, I would like the output data set to have a single row per each patient with a new variable (e.g., newvar) indicating whether the condition is met (diagnosis date of D or E or F - diagnosis date of A or B or C = le 365 days). Below is an example based on the example data set above. 

 

pid

newvar

1

1

2

0

3

1

4

1

5

1

6

0

7

1

 

Any help with this problem would be much appreciated. I'm assuming it would require some combination of proc sql and a data step with first/last variables; however, my programming skills are not yet advanced enough. Please let me know if I can provide any additional clarification or information. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Should be

5

0

6

1

?

 

How about this one ?

 

data work.test;
length pid $1.;
length diag $1.;
length diag_dt 8.; format diag_dt yymmdd10.;
infile datalines ;
input pid $ diag $ diag_dt date11.;
format diag_dt date11.;
datalines;
1 A 10JAN2013
1 D 10JAN2013 /*condition met because diagnosis D occurs on same date as A*/
1 G 10JAN2013
2 F 02JUN2017 
2 J 02JUN2017
2 B 01JUL2017 /*condition NOT met because diagnosis F occurs before B*/
2 J 01JUL2017
3 D 23OCT2014
3 C 17DEC2014
3 X 11JAN2015
3 D 25FEB2015 /*condition met because D occurs within 365 days after C*/
4 A 12MAR2015
4 Z 12MAR2015
4 A 20FEB2016 
4 E 07JAN2016 /*condition met because E occurs within 365 days after A on 12MAR2015*/
5 C 16JAN2017
5 A 28JAN2017
5 F 01FEB2018 /*condition NOT met because F occurs after 365 days from C and A*/
6 B 13NOV2016
6 B 10DEC2016
6 D 10DEC2016 /*condition met because D occurs within 365 days of B on 13NOV2016 or 10DEC2016*/
6 B 12DEC2017
7 A 01JAN2013
7 A 08JAN2014
7 D 07MAY2014 /*condition met because D occurs within 365 days of A on 08JAN2014 (but not A on 01JAN2013*/
7 A 15JUN2015
;
run;
proc sort data=test out=have;
by pid diag_dt;
run;
data want;
 array _c{99} $ 32 _temporary_;
 array _n{99} _temporary_;
 do i=1 by 1 until(last.pid);
  set have;
  by pid;
  _c{i}=diag;
  _n{i}=diag_dt;
 end;
new_var=0;
 do m=1 to i-1;
   do n=m+1 to i;
    if _c{n} in ('D' 'E' 'F') and _c{m} in ('A' 'B' 'C') then do;
     dif_day=_n{n}-_n{m};
	 if dif_day le 365  then do;new_var=1;yes=1;leave;end;
	end;
	if yes then leave;
   end;
 end;
 keep pid new_var;
 run;

View solution in original post

2 REPLIES 2
Ksharp
Super User

Should be

5

0

6

1

?

 

How about this one ?

 

data work.test;
length pid $1.;
length diag $1.;
length diag_dt 8.; format diag_dt yymmdd10.;
infile datalines ;
input pid $ diag $ diag_dt date11.;
format diag_dt date11.;
datalines;
1 A 10JAN2013
1 D 10JAN2013 /*condition met because diagnosis D occurs on same date as A*/
1 G 10JAN2013
2 F 02JUN2017 
2 J 02JUN2017
2 B 01JUL2017 /*condition NOT met because diagnosis F occurs before B*/
2 J 01JUL2017
3 D 23OCT2014
3 C 17DEC2014
3 X 11JAN2015
3 D 25FEB2015 /*condition met because D occurs within 365 days after C*/
4 A 12MAR2015
4 Z 12MAR2015
4 A 20FEB2016 
4 E 07JAN2016 /*condition met because E occurs within 365 days after A on 12MAR2015*/
5 C 16JAN2017
5 A 28JAN2017
5 F 01FEB2018 /*condition NOT met because F occurs after 365 days from C and A*/
6 B 13NOV2016
6 B 10DEC2016
6 D 10DEC2016 /*condition met because D occurs within 365 days of B on 13NOV2016 or 10DEC2016*/
6 B 12DEC2017
7 A 01JAN2013
7 A 08JAN2014
7 D 07MAY2014 /*condition met because D occurs within 365 days of A on 08JAN2014 (but not A on 01JAN2013*/
7 A 15JUN2015
;
run;
proc sort data=test out=have;
by pid diag_dt;
run;
data want;
 array _c{99} $ 32 _temporary_;
 array _n{99} _temporary_;
 do i=1 by 1 until(last.pid);
  set have;
  by pid;
  _c{i}=diag;
  _n{i}=diag_dt;
 end;
new_var=0;
 do m=1 to i-1;
   do n=m+1 to i;
    if _c{n} in ('D' 'E' 'F') and _c{m} in ('A' 'B' 'C') then do;
     dif_day=_n{n}-_n{m};
	 if dif_day le 365  then do;new_var=1;yes=1;leave;end;
	end;
	if yes then leave;
   end;
 end;
 keep pid new_var;
 run;
wj2
Quartz | Level 8 wj2
Quartz | Level 8

@Ksharp, this is perfect! Thank you!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 540 views
  • 1 like
  • 2 in conversation