Hello Community,
I am working on an analysis where I would like to create a binary variable (eg., new_var) indicating whether each patient in my dataset had a prescription date for a certain medication (rx_date) that was within 365 days of any date of a certain diagnosis (diagnosis_date). I have provided an example of what my input dataset looks like as well as my desired output dataset. Please note that currently my input dataset is set up where there is a missing value when a patient did not any prescription of the medication. Any help with this would be greatly appreciated!!
Have:
Patient_ID | Rx_date | Diagnosis_date |
001 | . | 28MAR2013 |
001 | . | 03JUL2013 |
001 | . | 03OCT2013 |
002 | 31JAN2017 | 10FEB2017 |
002 | 31JAN2017 | 29MAR2018 |
002 | 31JAN2017 | 12JUN2018 |
003 | 26DEC2019 | 25JUN2019 |
003 | 10JAN2020 | 26DEC2019 |
003 | 03MAR2020 | 11JAN2020 |
003 | 18MAY2020 | 30JAN2020 |
004 | 18OCT2015 | 01DEC2016 |
005 | 01MAY2020 | 01JAN2018 |
005 | 01APR2020 | 01FEB2018 |
005 | 15APR2020 | 01MAR2018 |
006 | 14DEC2015 | 04NOV2018 |
006 | 04OCT2018 | 06DEC2018 |
007 | 23AUG2017 | 10AUG2020 |
007 | 13SEP2017 | 04NOV2020 |
007 | 23MAY2019 | 11NOV2020 |
007 | 31DEC2019 | 24NOV2020 |
007 | 31DEC2019 | 01DEC2020 |
007 | 31DEC2019 | 17FEB2021 |
Want (One row per patient_ID):
Patient_ID | New_var |
001 | 0 |
002 | 1 |
003 | 1 |
004 | 0 |
005 | 0 |
006 | 1 |
007 | 1 |
data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001 . 28MAR2013
001 . 03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;
proc transpose data=test out=test_t1 prefix=rx_date;
by patient_id;
var rx_date;
run;
proc transpose data=test out=test_t2 prefix=diag_date;
by patient_id;
var diag_date;
run;
data want;
merge test_t1 test_t2;
by patient_id;
array rx rx_date:;
array di diag_date:;
flag=0;
do i=1 to dim(rx);
do j=1 to dim(di);
if not missing(rx(i)) and not missing(di(j)) and abs(rx(i)-di(j))<=365
then do; flag=1; leave; end;
end;
end;
drop j i _name_;
run;
Comment: you didn't really specify if you want Rx_date before or after diagnosis_date, this looks at both.
Comment2: normally I advise against turning long data sets into wide, this is an exception. You could also program this without turning long data sets into wide using DOW loops or hash objects.
Question for clarification: do you want the Rx_date to be within 365 days before the diagnosis_date? Or do you want the Rx_date to be within 365 days after the diagnosis date? Or both?
Do a subtraction, this tells you how many days between Rx_date and diagnosis_date. If it is within 365 days of each other, set a flag to be equal to 1. Then use PROC SUMMARY to find the maximum value of the flag variable for each patient_ID.
/* UNTESTED CODE */
data intermediate;
set have;
flag=(rx_date-diagnosis_date)<=365;
run;
proc summary data=intermediate nway;
class patient_id;
var flag;
output out=want max=new_var;
run;
If you want tested code, you must provide data as SAS data step code (instructions). In fact, you should always provide data via this method, instead of screen captures as you have done here.
@PaigeMiller , thank you very much for the suggestion. However, this approach would only calculate the difference in dates within the same row, right? What if there is a case like shown below where the last diagnosis_date (05JUN2019) is not within 365 days of the rx_date on the same row (01JAN2021), but is 365 days within the rx_date on previous rows? I guess what I am wondering more specifically is how to use an array approach, or something similar, to create such a variable. Also, I would like to calculate whether the dates are 365 days before OR after each other. Any help with this would be much appreciated.
Patient_ID | Rx_date | Diagnosis_date |
008 | 01JAN2020 | 01JAN2017 |
008 | 15JAN2020 | 01JAN2018 |
008 | 01JAN2021 | 05JUN2019 |
Thank you for the clarification. Before I try to answer the question, I would need you to provide some of your data in the format requested above.
@PaigeMiller , sure thing. Please see below. FYI, the data is not showing up exactly how it should for some reason when I run the code. Not sure if this is because of the 3 missing values for the first 3 rows of the second column and something that needs to be specified for the format statement? Nevertheless, the raw data to be used are provided below. Please let me know if I can provide any additional info. Thank you very much for your help.
data work.test;
input Patient_ID $ Rx_date date9. Diag_date date9.;
infile datalines;
datalines;
001 . 28MAR2013
001 . 03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;
run;
data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001 . 28MAR2013
001 . 03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;
proc transpose data=test out=test_t1 prefix=rx_date;
by patient_id;
var rx_date;
run;
proc transpose data=test out=test_t2 prefix=diag_date;
by patient_id;
var diag_date;
run;
data want;
merge test_t1 test_t2;
by patient_id;
array rx rx_date:;
array di diag_date:;
flag=0;
do i=1 to dim(rx);
do j=1 to dim(di);
if not missing(rx(i)) and not missing(di(j)) and abs(rx(i)-di(j))<=365
then do; flag=1; leave; end;
end;
end;
drop j i _name_;
run;
Comment: you didn't really specify if you want Rx_date before or after diagnosis_date, this looks at both.
Comment2: normally I advise against turning long data sets into wide, this is an exception. You could also program this without turning long data sets into wide using DOW loops or hash objects.
@PaigeMiller thank you! This appears to have worked perfectly!
Out of curiosity, would you be able to show how this might also be done with the DOW loop or hash object approach that you mentioned? I'm a new SAS learner so seeing other approaches of doing the same thing is always very helpful
@wj2 wrote:
Out of curiosity, would you be able to show how this might also be done with the DOW loop or hash object approach that you mentioned?
No, I can't. Maybe others can.
It is still not clear to me what you are trying to test.
Do you want a different flag for every observation? Or just one at the PATIENT_ID level?
What is the issue you are trying to test?
Are you checking if people begin treatment within a year of initial diagnosis? Some measure of progression of disease perhaps?
Are you checking if people are diagnosed within a year of beginning treatment? Perhaps to detect a side effect of the treatment?
What is the actual output you want from that input?
@Tom for my desired output, I would like one flag at the patient_ID level (i.e., NOT a different flag for every observation). I would like it to look like what I have shown in my first post.
The issue I am trying to test is the prevalence of patients who began treatment (i.e., received a prescription) within one year of receiving a diagnosis of the disease. A patient would be counted as receiving treatment if any date of the prescription is within 365 days of any date of the diagnosis.
As I mentioned above, the approach offered by @PaigeMiller using the array statements worked. I was just curious to see another approach for this problem using DOW loops or hash objects as a learning experience.
I still don't understand what you are trying to test.
Here is what I would do.
proc sql;
create table have as
select patient_id
, min(diag_date) as first_dx format=date9.
, min(rx_date) as first_rx format=date9.
, calculated first_rx - calculated first_dx as offset
from test
group by patient_id
;
quit;
Results
Patient_ Obs ID first_dx first_rx offset 1 001 28MAR2013 . . 2 002 10FEB2017 31JAN2017 -10 3 003 25JUN2019 26DEC2019 184 4 004 01DEC2016 18OCT2015 -410 5 005 01JAN2018 01APR2020 821 6 006 04NOV2018 14DEC2015 -1056 7 007 10AUG2020 23AUG2017 -1083
So it looks like most of them started treatment before they received a DX. One started treatment about 6 months after DX and the other started treatment after more than 2 years.
data work.test;
input Patient_ID $ Rx_date :date9. Diag_date :date9.;
infile datalines;
datalines;
001 . 28MAR2013
001 . 03JUL2013
001 . 03OCT2013
002 31JAN2017 10FEB2017
002 31JAN2017 29MAR2018
002 31JAN2017 12JUN2018
003 26DEC2019 25JUN2019
003 10JAN2020 26DEC2019
003 03MAR2020 11JAN2020
003 18MAY2020 30JAN2020
004 18OCT2015 01DEC2016
005 01MAY2020 01JAN2018
005 01APR2020 01FEB2018
005 15APR2020 01MAR2018
006 14DEC2015 04NOV2018
006 04OCT2018 06DEC2018
007 23AUG2017 10AUG2020
007 13SEP2017 04NOV2020
007 23MAY2019 11NOV2020
007 31DEC2019 24NOV2020
007 31DEC2019 01DEC2020
007 31DEC2019 17FEB2021
;
proc sql;
create table want as
select a.Patient_ID,max(.<abs(a.Rx_date-b.Diag_date)<=365) as flag
from (select Patient_ID,Rx_date from test) as a ,
(select Patient_ID,Diag_date from test) as b
where a.Patient_ID=b.Patient_ID
group by a.Patient_ID
;
quit;
@Ksharp wrote:
data work.test; input Patient_ID $ Rx_date :date9. Diag_date :date9.; infile datalines; datalines; 001 . 28MAR2013 001 . 03JUL2013 001 . 03OCT2013 002 31JAN2017 10FEB2017 002 31JAN2017 29MAR2018 002 31JAN2017 12JUN2018 003 26DEC2019 25JUN2019 003 10JAN2020 26DEC2019 003 03MAR2020 11JAN2020 003 18MAY2020 30JAN2020 004 18OCT2015 01DEC2016 005 01MAY2020 01JAN2018 005 01APR2020 01FEB2018 005 15APR2020 01MAR2018 006 14DEC2015 04NOV2018 006 04OCT2018 06DEC2018 007 23AUG2017 10AUG2020 007 13SEP2017 04NOV2020 007 23MAY2019 11NOV2020 007 31DEC2019 24NOV2020 007 31DEC2019 01DEC2020 007 31DEC2019 17FEB2021 ; proc sql; create table want as select a.Patient_ID,max(.<abs(a.Rx_date-b.Diag_date)<=365) as flag from (select Patient_ID,Rx_date from test) as a , (select Patient_ID,Diag_date from test) as b where a.Patient_ID=b.Patient_ID group by a.Patient_ID ; quit;
Brilliant!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.