Solved: Re: How to figure out if two dates are within 365 days of each other

wj2 · Posted 12-13-2021 01:16 PM

Hello Community,

I am working on an analysis where I would like to create a binary variable (eg., new_var) indicating whether each patient in my dataset had a prescription date for a certain medication (rx_date) that was within 365 days of any date of a certain diagnosis (diagnosis_date). I have provided an example of what my input dataset looks like as well as my desired output dataset. Please note that currently my input dataset is set up where there is a missing value when a patient did not any prescription of the medication. Any help with this would be greatly appreciated!!

Have:

Patient_ID	Rx_date	Diagnosis_date
001	.	28MAR2013
001	.	03JUL2013
001	.	03OCT2013
002	31JAN2017	10FEB2017
002	31JAN2017	29MAR2018
002	31JAN2017	12JUN2018
003	26DEC2019	25JUN2019
003	10JAN2020	26DEC2019
003	03MAR2020	11JAN2020
003	18MAY2020	30JAN2020
004	18OCT2015	01DEC2016
005	01MAY2020	01JAN2018
005	01APR2020	01FEB2018
005	15APR2020	01MAR2018
006	14DEC2015	04NOV2018
006	04OCT2018	06DEC2018
007	23AUG2017	10AUG2020
007	13SEP2017	04NOV2020
007	23MAY2019	11NOV2020
007	31DEC2019	24NOV2020
007	31DEC2019	01DEC2020
007	31DEC2019	17FEB2021

Want (One row per patient_ID):

Patient_ID	New_var
001	0
002	1
003	1
004	0
005	0
006	1
007	1

PaigeMiller · Posted 12-13-2021 02:35 PM

data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001	.   28MAR2013
001	.   03JUL2013
001	.	03OCT2013
002	31JAN2017	10FEB2017
002	31JAN2017	29MAR2018
002	31JAN2017	12JUN2018
003	26DEC2019	25JUN2019
003	10JAN2020	26DEC2019
003	03MAR2020	11JAN2020
003	18MAY2020	30JAN2020
004	18OCT2015	01DEC2016
005	01MAY2020	01JAN2018
005	01APR2020	01FEB2018
005	15APR2020	01MAR2018
006	14DEC2015	04NOV2018
006	04OCT2018	06DEC2018
007	23AUG2017	10AUG2020
007	13SEP2017	04NOV2020
007	23MAY2019	11NOV2020
007	31DEC2019	24NOV2020
007	31DEC2019	01DEC2020
007	31DEC2019	17FEB2021
;

proc transpose data=test out=test_t1 prefix=rx_date;
    by patient_id;
    var rx_date;
run;
proc transpose data=test out=test_t2 prefix=diag_date;
    by patient_id;
    var diag_date;
run;
data want;
    merge test_t1 test_t2;
    by patient_id;
    array rx rx_date:;
    array di diag_date:;
    flag=0;
    do i=1 to dim(rx);
        do j=1 to dim(di);
            if not missing(rx(i)) and not missing(di(j)) and abs(rx(i)-di(j))<=365 
                then do; flag=1; leave; end;
        end;
    end;
    drop j i _name_;
run;

Comment: you didn't really specify if you want Rx_date before or after diagnosis_date, this looks at both.

Comment2: normally I advise against turning long data sets into wide, this is an exception. You could also program this without turning long data sets into wide using DOW loops or hash objects.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 12-13-2021 01:21 PM

Question for clarification: do you want the Rx_date to be within 365 days before the diagnosis_date? Or do you want the Rx_date to be within 365 days after the diagnosis date? Or both?

Do a subtraction, this tells you how many days between Rx_date and diagnosis_date. If it is within 365 days of each other, set a flag to be equal to 1. Then use PROC SUMMARY to find the maximum value of the flag variable for each patient_ID.

/* UNTESTED CODE */
data intermediate;
    set have;
    flag=(rx_date-diagnosis_date)<=365;
run;
proc summary data=intermediate nway;
    class patient_id;
    var flag;
    output out=want max=new_var;
run;

If you want tested code, you must provide data as SAS data step code (instructions). In fact, you should always provide data via this method, instead of screen captures as you have done here.

--
Paige Miller

wj2 · Posted 12-13-2021 01:34 PM

@PaigeMiller , thank you very much for the suggestion. However, this approach would only calculate the difference in dates within the same row, right? What if there is a case like shown below where the last diagnosis_date (05JUN2019) is not within 365 days of the rx_date on the same row (01JAN2021), but is 365 days within the rx_date on previous rows? I guess what I am wondering more specifically is how to use an array approach, or something similar, to create such a variable. Also, I would like to calculate whether the dates are 365 days before OR after each other. Any help with this would be much appreciated.

Patient_ID	Rx_date	Diagnosis_date
008	01JAN2020	01JAN2017
008	15JAN2020	01JAN2018
008	01JAN2021	05JUN2019

PaigeMiller · Posted 12-13-2021 01:39 PM

Thank you for the clarification. Before I try to answer the question, I would need you to provide some of your data in the format requested above.

--
Paige Miller

wj2 · Posted 12-13-2021 02:04 PM

@PaigeMiller , sure thing. Please see below. FYI, the data is not showing up exactly how it should for some reason when I run the code. Not sure if this is because of the 3 missing values for the first 3 rows of the second column and something that needs to be specified for the format statement? Nevertheless, the raw data to be used are provided below. Please let me know if I can provide any additional info. Thank you very much for your help.

data work.test;
input Patient_ID $ Rx_date date9. Diag_date date9.;
infile datalines;
datalines;
001	.	28MAR2013
001	.	03JUL2013
001	.	03OCT2013
002	31JAN2017	10FEB2017
002	31JAN2017	29MAR2018
002	31JAN2017	12JUN2018
003	26DEC2019	25JUN2019
003	10JAN2020	26DEC2019
003	03MAR2020	11JAN2020
003	18MAY2020	30JAN2020
004	18OCT2015	01DEC2016
005	01MAY2020	01JAN2018
005	01APR2020	01FEB2018
005	15APR2020	01MAR2018
006	14DEC2015	04NOV2018
006	04OCT2018	06DEC2018
007	23AUG2017	10AUG2020
007	13SEP2017	04NOV2020
007	23MAY2019	11NOV2020
007	31DEC2019	24NOV2020
007	31DEC2019	01DEC2020
007	31DEC2019	17FEB2021
;
run;

PaigeMiller · Posted 12-13-2021 02:35 PM

data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001	.   28MAR2013
001	.   03JUL2013
001	.	03OCT2013
002	31JAN2017	10FEB2017
002	31JAN2017	29MAR2018
002	31JAN2017	12JUN2018
003	26DEC2019	25JUN2019
003	10JAN2020	26DEC2019
003	03MAR2020	11JAN2020
003	18MAY2020	30JAN2020
004	18OCT2015	01DEC2016
005	01MAY2020	01JAN2018
005	01APR2020	01FEB2018
005	15APR2020	01MAR2018
006	14DEC2015	04NOV2018
006	04OCT2018	06DEC2018
007	23AUG2017	10AUG2020
007	13SEP2017	04NOV2020
007	23MAY2019	11NOV2020
007	31DEC2019	24NOV2020
007	31DEC2019	01DEC2020
007	31DEC2019	17FEB2021
;

proc transpose data=test out=test_t1 prefix=rx_date;
    by patient_id;
    var rx_date;
run;
proc transpose data=test out=test_t2 prefix=diag_date;
    by patient_id;
    var diag_date;
run;
data want;
    merge test_t1 test_t2;
    by patient_id;
    array rx rx_date:;
    array di diag_date:;
    flag=0;
    do i=1 to dim(rx);
        do j=1 to dim(di);
            if not missing(rx(i)) and not missing(di(j)) and abs(rx(i)-di(j))<=365 
                then do; flag=1; leave; end;
        end;
    end;
    drop j i _name_;
run;

Comment: you didn't really specify if you want Rx_date before or after diagnosis_date, this looks at both.

Comment2: normally I advise against turning long data sets into wide, this is an exception. You could also program this without turning long data sets into wide using DOW loops or hash objects.

--
Paige Miller

wj2 · Posted 12-13-2021 03:01 PM

@PaigeMiller thank you! This appears to have worked perfectly!

Out of curiosity, would you be able to show how this might also be done with the DOW loop or hash object approach that you mentioned? I'm a new SAS learner so seeing other approaches of doing the same thing is always very helpful

PaigeMiller · Posted 12-13-2021 03:32 PM

@wj2 wrote:

Out of curiosity, would you be able to show how this might also be done with the DOW loop or hash object approach that you mentioned?

No, I can't. Maybe others can.

--
Paige Miller

Tom · Posted 12-13-2021 03:41 PM

It is still not clear to me what you are trying to test.

Do you want a different flag for every observation? Or just one at the PATIENT_ID level?

What is the issue you are trying to test?

Are you checking if people begin treatment within a year of initial diagnosis? Some measure of progression of disease perhaps?

Are you checking if people are diagnosed within a year of beginning treatment? Perhaps to detect a side effect of the treatment?

What is the actual output you want from that input?

wj2 · Posted 12-13-2021 03:53 PM

@Tom for my desired output, I would like one flag at the patient_ID level (i.e., NOT a different flag for every observation). I would like it to look like what I have shown in my first post.

The issue I am trying to test is the prevalence of patients who began treatment (i.e., received a prescription) within one year of receiving a diagnosis of the disease. A patient would be counted as receiving treatment if any date of the prescription is within 365 days of any date of the diagnosis.

As I mentioned above, the approach offered by @PaigeMiller using the array statements worked. I was just curious to see another approach for this problem using DOW loops or hash objects as a learning experience.

Tom · Posted 12-13-2021 04:12 PM

I still don't understand what you are trying to test.

Here is what I would do.

proc sql;
  create table have as 
    select patient_id
         , min(diag_date) as first_dx format=date9.
         , min(rx_date) as first_rx format=date9.
         , calculated first_rx - calculated first_dx as offset
    from test
    group by patient_id
  ;
quit;

Results

       Patient_
Obs       ID        first_dx     first_rx    offset

 1       001       28MAR2013            .         .
 2       002       10FEB2017    31JAN2017       -10
 3       003       25JUN2019    26DEC2019       184
 4       004       01DEC2016    18OCT2015      -410
 5       005       01JAN2018    01APR2020       821
 6       006       04NOV2018    14DEC2015     -1056
 7       007       10AUG2020    23AUG2017     -1083

So it looks like most of them started treatment before they received a DX. One started treatment about 6 months after DX and the other started treatment after more than 2 years.

Ksharp · Posted 12-14-2021 07:22 AM

data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001 .   28MAR2013
001 .   03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;


proc sql;
create table want as
select a.Patient_ID,max(.<abs(a.Rx_date-b.Diag_date)<=365) as flag
 from (select Patient_ID,Rx_date from test) as a ,
      (select Patient_ID,Diag_date from test) as b
  where a.Patient_ID=b.Patient_ID
   group by a.Patient_ID
;
quit;

PaigeMiller · Posted 12-14-2021 07:33 AM

@Ksharp wrote:

data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001 .   28MAR2013
001 .   03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;


proc sql;
create table want as
select a.Patient_ID,max(.<abs(a.Rx_date-b.Diag_date)<=365) as flag
 from (select Patient_ID,Rx_date from test) as a ,
      (select Patient_ID,Diag_date from test) as b
  where a.Patient_ID=b.Patient_ID
   group by a.Patient_ID
;
quit;

Brilliant!

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!