BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wj2
Quartz | Level 8 wj2
Quartz | Level 8

Hello Community,

 

I am working on an analysis where I would like to create a binary variable (eg., new_var) indicating whether each patient in my dataset had a prescription date for a certain medication (rx_date) that was within 365 days of any date of a certain diagnosis (diagnosis_date). I have provided an example of what my input dataset looks like as well as my desired output dataset. Please note that currently my input dataset is set up where there is a missing value when a patient did not any prescription of the medication. Any help with this would be greatly appreciated!!

 

Have:

Patient_ID

Rx_date

Diagnosis_date

001

.

28MAR2013

001

.

03JUL2013

001

.

03OCT2013

002

31JAN2017

10FEB2017

002

31JAN2017

29MAR2018

002

31JAN2017

12JUN2018

003

26DEC2019

25JUN2019

003

10JAN2020

26DEC2019

003

03MAR2020

11JAN2020

003

18MAY2020

30JAN2020

004

18OCT2015

01DEC2016

005

01MAY2020

01JAN2018

005

01APR2020

01FEB2018

005

15APR2020

01MAR2018

006

14DEC2015

04NOV2018

006

04OCT2018

06DEC2018

007

23AUG2017

10AUG2020

007

13SEP2017

04NOV2020

007

23MAY2019

11NOV2020

007

31DEC2019

24NOV2020

007

31DEC2019

01DEC2020

007

31DEC2019

17FEB2021

 

Want (One row per patient_ID):

Patient_ID

New_var

001

0

002

1

003

1

004

0

005

0

006

1

007

1

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001	.   28MAR2013
001	.   03JUL2013
001	.	03OCT2013
002	31JAN2017	10FEB2017
002	31JAN2017	29MAR2018
002	31JAN2017	12JUN2018
003	26DEC2019	25JUN2019
003	10JAN2020	26DEC2019
003	03MAR2020	11JAN2020
003	18MAY2020	30JAN2020
004	18OCT2015	01DEC2016
005	01MAY2020	01JAN2018
005	01APR2020	01FEB2018
005	15APR2020	01MAR2018
006	14DEC2015	04NOV2018
006	04OCT2018	06DEC2018
007	23AUG2017	10AUG2020
007	13SEP2017	04NOV2020
007	23MAY2019	11NOV2020
007	31DEC2019	24NOV2020
007	31DEC2019	01DEC2020
007	31DEC2019	17FEB2021
;

proc transpose data=test out=test_t1 prefix=rx_date;
    by patient_id;
    var rx_date;
run;
proc transpose data=test out=test_t2 prefix=diag_date;
    by patient_id;
    var diag_date;
run;
data want;
    merge test_t1 test_t2;
    by patient_id;
    array rx rx_date:;
    array di diag_date:;
    flag=0;
    do i=1 to dim(rx);
        do j=1 to dim(di);
            if not missing(rx(i)) and not missing(di(j)) and abs(rx(i)-di(j))<=365 
                then do; flag=1; leave; end;
        end;
    end;
    drop j i _name_;
run; 

Comment: you didn't really specify if you want Rx_date before or after diagnosis_date, this looks at both. 

 

Comment2: normally I advise against turning long data sets into wide, this is an exception. You could also program this without turning long data sets into wide using DOW loops or hash objects. 

--
Paige Miller

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Question for clarification: do you want the Rx_date to be within 365 days before the diagnosis_date? Or do you want the Rx_date to be within 365 days after the diagnosis date? Or both?

 

Do a subtraction, this tells you how many days between Rx_date and diagnosis_date. If it is within 365 days of each other, set a flag to be equal to 1. Then use PROC SUMMARY to find the maximum value of the flag variable for each patient_ID.

 

/* UNTESTED CODE */
data intermediate;
    set have;
    flag=(rx_date-diagnosis_date)<=365;
run;
proc summary data=intermediate nway;
    class patient_id;
    var flag;
    output out=want max=new_var;
run;

 

If you want tested code, you must provide data as SAS data step code (instructions). In fact, you should always provide data via this method, instead of screen captures as you have done here.

--
Paige Miller
wj2
Quartz | Level 8 wj2
Quartz | Level 8

@PaigeMiller , thank you very much for the suggestion. However, this approach would only calculate the difference in dates within the same row, right? What if there is a case like shown below where the last diagnosis_date (05JUN2019) is not within 365 days of the rx_date on the same row (01JAN2021), but is 365 days within the rx_date on previous rows? I guess what I am wondering more specifically is how to use an array approach, or something similar, to create such a variable. Also, I would like to calculate whether the dates are 365 days before OR after each other. Any help with this would be much appreciated. 

 

Patient_ID

Rx_date

Diagnosis_date

008

01JAN2020

01JAN2017

008

15JAN2020

01JAN2018

008

01JAN2021

05JUN2019

PaigeMiller
Diamond | Level 26

Thank you for the clarification. Before I try to answer the question, I would need you to provide some of your data in the format requested above.

--
Paige Miller
wj2
Quartz | Level 8 wj2
Quartz | Level 8

@PaigeMiller , sure thing. Please see below. FYI, the data is not showing up exactly how it should for some reason when I run the code. Not sure if this is because of the 3 missing values for the first 3 rows of the second column and something that needs to be specified for the format statement? Nevertheless, the raw data to be used are provided below. Please let me know if I can provide any additional info. Thank you very much for your help. 

data work.test;
input Patient_ID $ Rx_date date9. Diag_date date9.;
infile datalines;
datalines;
001	.	28MAR2013
001	.	03JUL2013
001	.	03OCT2013
002	31JAN2017	10FEB2017
002	31JAN2017	29MAR2018
002	31JAN2017	12JUN2018
003	26DEC2019	25JUN2019
003	10JAN2020	26DEC2019
003	03MAR2020	11JAN2020
003	18MAY2020	30JAN2020
004	18OCT2015	01DEC2016
005	01MAY2020	01JAN2018
005	01APR2020	01FEB2018
005	15APR2020	01MAR2018
006	14DEC2015	04NOV2018
006	04OCT2018	06DEC2018
007	23AUG2017	10AUG2020
007	13SEP2017	04NOV2020
007	23MAY2019	11NOV2020
007	31DEC2019	24NOV2020
007	31DEC2019	01DEC2020
007	31DEC2019	17FEB2021
;
run;
PaigeMiller
Diamond | Level 26
data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001	.   28MAR2013
001	.   03JUL2013
001	.	03OCT2013
002	31JAN2017	10FEB2017
002	31JAN2017	29MAR2018
002	31JAN2017	12JUN2018
003	26DEC2019	25JUN2019
003	10JAN2020	26DEC2019
003	03MAR2020	11JAN2020
003	18MAY2020	30JAN2020
004	18OCT2015	01DEC2016
005	01MAY2020	01JAN2018
005	01APR2020	01FEB2018
005	15APR2020	01MAR2018
006	14DEC2015	04NOV2018
006	04OCT2018	06DEC2018
007	23AUG2017	10AUG2020
007	13SEP2017	04NOV2020
007	23MAY2019	11NOV2020
007	31DEC2019	24NOV2020
007	31DEC2019	01DEC2020
007	31DEC2019	17FEB2021
;

proc transpose data=test out=test_t1 prefix=rx_date;
    by patient_id;
    var rx_date;
run;
proc transpose data=test out=test_t2 prefix=diag_date;
    by patient_id;
    var diag_date;
run;
data want;
    merge test_t1 test_t2;
    by patient_id;
    array rx rx_date:;
    array di diag_date:;
    flag=0;
    do i=1 to dim(rx);
        do j=1 to dim(di);
            if not missing(rx(i)) and not missing(di(j)) and abs(rx(i)-di(j))<=365 
                then do; flag=1; leave; end;
        end;
    end;
    drop j i _name_;
run; 

Comment: you didn't really specify if you want Rx_date before or after diagnosis_date, this looks at both. 

 

Comment2: normally I advise against turning long data sets into wide, this is an exception. You could also program this without turning long data sets into wide using DOW loops or hash objects. 

--
Paige Miller
wj2
Quartz | Level 8 wj2
Quartz | Level 8

@PaigeMiller thank you! This appears to have worked perfectly! 

 

Out of curiosity, would you be able to show how this might also be done with the DOW loop or hash object approach that you mentioned? I'm a new SAS learner so seeing other approaches of doing the same thing is always very helpful

PaigeMiller
Diamond | Level 26

@wj2 wrote:

 

Out of curiosity, would you be able to show how this might also be done with the DOW loop or hash object approach that you mentioned?


No, I can't. Maybe others can.

--
Paige Miller
Tom
Super User Tom
Super User

It is still not clear to me what you are trying to test.  

Do you want a different flag for every observation? Or just one at the PATIENT_ID level?

 

What is the issue you are trying to test?

Are you checking if people begin treatment within a year of initial diagnosis?  Some measure of progression of disease perhaps?

 

Are you checking if people are diagnosed within a year of beginning treatment?  Perhaps to detect a side effect of the treatment?

 

What is the actual output you want from that input?

 

wj2
Quartz | Level 8 wj2
Quartz | Level 8

@Tom for my desired output, I would like one flag at the patient_ID level (i.e., NOT a different flag for every observation). I would like it to look like what I have shown in my first post. 

 

The issue I am trying to test is the prevalence of patients who began treatment (i.e., received a prescription) within one year of receiving a diagnosis of the disease. A patient would be counted as receiving treatment if any date of the prescription is within 365 days of any date of the diagnosis. 

 

As I mentioned above, the approach offered by @PaigeMiller using the array statements worked. I was just curious to see another approach for this problem using DOW loops or hash objects as a learning experience. 

Tom
Super User Tom
Super User

I still don't understand what you are trying to test.

 

Here is what I would do.

proc sql;
  create table have as 
    select patient_id
         , min(diag_date) as first_dx format=date9.
         , min(rx_date) as first_rx format=date9.
         , calculated first_rx - calculated first_dx as offset
    from test
    group by patient_id
  ;
quit;

Results

       Patient_
Obs       ID        first_dx     first_rx    offset

 1       001       28MAR2013            .         .
 2       002       10FEB2017    31JAN2017       -10
 3       003       25JUN2019    26DEC2019       184
 4       004       01DEC2016    18OCT2015      -410
 5       005       01JAN2018    01APR2020       821
 6       006       04NOV2018    14DEC2015     -1056
 7       007       10AUG2020    23AUG2017     -1083

So it looks like most of them started treatment before they received a DX.  One started treatment about 6 months after DX and the other started treatment after more than 2 years.

Ksharp
Super User
data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001 .   28MAR2013
001 .   03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;


proc sql;
create table want as
select a.Patient_ID,max(.<abs(a.Rx_date-b.Diag_date)<=365) as flag
 from (select Patient_ID,Rx_date from test) as a ,
      (select Patient_ID,Diag_date from test) as b
  where a.Patient_ID=b.Patient_ID
   group by a.Patient_ID
;
quit;
PaigeMiller
Diamond | Level 26

@Ksharp wrote:
data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001 .   28MAR2013
001 .   03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;


proc sql;
create table want as
select a.Patient_ID,max(.<abs(a.Rx_date-b.Diag_date)<=365) as flag
 from (select Patient_ID,Rx_date from test) as a ,
      (select Patient_ID,Diag_date from test) as b
  where a.Patient_ID=b.Patient_ID
   group by a.Patient_ID
;
quit;

Brilliant!

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1552 views
  • 3 likes
  • 4 in conversation