Hello,
I have a large dataset but i have provided a sample of the data, my coding and expected output. My code is not giving me the expected result so I need guidance.
I am working on dates variable, I want to generate the date of mri and date of biopsy.Here is a sample:
data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A 03/01/2017 55700 04 2017 1 0 0
A 03/05/2017 55700 04 2017 1 0 0
;
run;
1. When biop=1, I want to generate the date_biops closest to but pre/in diagnosis month/year i.e the date of biosp will be the last date closest to the diagnosis month/year
2. When mri=1, I want to generate the date of mri closest to but pre biopsy date.
This is how I want my output to look like.
pt_id | Month_diag | year_diag | dt_biop | dt_mri |
A | April | 2017 | 3/5/2017 | 1/1/2017 |
This is my attempted code:
Data want;
set have;
Date_code = catx('/',year_diag,Month_diag);
by pt_id;
first_biop=first.biop;
last_biop=last.biop;
retain DT_BIOP;
if first.biop then call missing (DT_BIOP);
if date <= Date_code and not missing(BIOP) then DT_BIOP=date;
else if missing(DT_BIOP) then biop_2 =0;
format DT_BIOP yymmdd10.;
run;
Thank you in advance
Hi,
Thanks for taking the time to post a clear question, with a description, sample data, and the code you have tried.
This code doesn't give you exactly the output you want, but I think the logic should work for you.
The logic is:
data want (keep=pt_id DiagDate dt_biop dt_mri);
set have ;
by pt_id date ;
*Make a diagnosis date that is the last day of the month of diagnosis ;
*Its helpful to have a date variable ;
DiagDate=intnx('month',mdy(month_diag,1,year_diag),0,'e') ;
retain dt_biop dt_mri ;
if first.pt_id then do ;
call missing(dt_biop,dt_mri) ;
end ;
if date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
if mri=1 then dt_mri=date ;
end ;
if last.pt_id ;
format DiagDate dt_biop dt_mri date9. ;
run ;
Below code gives you the output as required -
*sort data by patient id and date;
proc sort data= have; by pt_id date; run;
*create temporary table to format date_code & assign first and last biop indicators;
data _temp_want1_;
set have;
by pt_id date;
format date date9.;
if Month_diag <= 9 then
do;
_temp_Date_code = catx('/',('0'||strip(substr((put(Month_diag,$2.)),1,2))),'01',year_diag);
end;
else if Month_diag > 9 then
do;
_temp_Date_code = catx('/',(strip(substr((put(Month_diag,$2.)),1,2))),'01',year_diag);
end;
_temp_Date_code2 = input(_temp_Date_code, MMDDYY10.);
Date_code = _temp_Date_code2;
format Date_code date9.;
drop _temp_Date_code _temp_Date_code2;
If first.pt_id then
do;
first_biop = biop;
end;
If last.pt_id then
do;
last_biop = biop;
end;
run;
*inner table: calculate latest date pre diagnosis date to use in dt_biop assignment;
*outer table t2: join inner table to first temporary table to assign temporary dt_biop;
proc sql;
create table _temp_want_biop_dte_join as
select t1.*
, t2. temp_dt_biop
from _temp_want1_ t1
left join (select pt_id
, date
, case
when max(date) = date and date < date_code then date
else . end as temp_dt_biop format=date9.
from _temp_want1_
group by pt_id
having biop = 1 and max(date) = date) t2
on t1.pt_id = t2.pt_id
and t1.date = t2.date;
quit;
*assign temporary dt_biopsy across all rows and rename to the final dt_biopsy field;
proc sql;
create table _temp_want_biop_dte_join_max as
select *
, max(temp_dt_biop) as dt_biop format=date9.
from _temp_want_biop_dte_join
group by pt_id;
quit;
*inner table t1: select all data where mri = 1
*inner table t1: from table t1, group by patient id to select the latest date where mri = 1 and date is less than biopsy date;
*outer table: select only patient id, date and temporary mri date to join on to full table on the next steps;
proc sql;
create table _temp_want_mri_dte as
select t2.pt_id
, t2.date
, t2.temp_dt_mri
from (select t1.*
, case
when max(t1.date) = date and date < dt_biop then date
else . end as temp_dt_mri format=date9.
from (select *
from _temp_want_biop_dte_join_max
where mri = 1) t1
group by t1.pt_id) t2
where t2.temp_dt_mri = t2.date;
quit;
*join temprary mri date onto full data for the same date as where mri = 1 & date is less than biopsy date;
proc sql;
create table _temp_want_mri_dte_join as
select t1.*
, t2.temp_dt_mri
from _temp_want_biop_dte_join_max t1
left join _temp_want_mri_dte t2
on t1.pt_id = t2.pt_id
and t1.date = t2.date;
quit;
*fill in temporary mri date across all rows in full data;
proc sql;
create table _Temp_want_max_mri_dte as
select *
, max(temp_dt_mri) as dt_mri format=date9.
from _temp_want_mri_dte_join
group by pt_id;
quit;
*select single row per patient id with required fields & formats;
proc sql;
create table want as
select distinct pt_id
, put(date_code, monname3.) as month_diag
, year(date_code) as year_diag
, dt_biop format=MMDDYY10.
, dt_mri format=MMDDYY10.
from _Temp_want_max_mri_dte;
quit;
/*delete temporary tables*/
proc sql;
drop table _temp_want1_;
drop table _temp_want_biop_dte_join;
drop table _temp_want_biop_dte_join_max;
drop table _temp_want_mri_dte;
drop table _temp_want_mri_dte_join;
drop table _Temp_want_max_mri_dte;
quit;
Do you mean you have cases where the variable DATE is missing? For my code, you could change:
if date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
if mri=1 then dt_mri=date ;
end ;
to:
if .Z < date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
if mri=1 then dt_mri=date ;
end ;
That will ignore events where the DATE is missing.
Unrelated, I just noticed that your specification is:
1. When biop=1, I want to generate the date_biops closest to but pre/in diagnosis month/year
2. When mri=1, I want to generate the date of mri closest to but pre/in biopsy date.
I didn't do #2 like you want. I generated date_mri as the date closest to but pre/in the diagnosis month.
I recreated the file with missing data but when I ran your new code the out was still missing.
data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
;
run;
proc sort data = have; by pt_ID date ;proc print; run;
data want (keep=pt_id DiagDate dt_biop dt_mri);
set have ;
by pt_id date ;
*Make a diagnosis date that is the last day of the month of diagnosis ;
*Its helpful to have a date variable ;
DiagDate= mdy(month_diag,15,year_diag) ;
retain dt_biop dt_mri ;
if first.pt_id then do ;
call missing(dt_biop,dt_mri) ;
end ;
if .Z < date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
if mri=1 then dt_mri=date ;
end ;
if last.pt_id ;
format DiagDate dt_biop dt_mri date9. ;
run ;
Hi,
dt_biop is missing for patient A because even though the patient had two biopsies, you don't know the date of the biopsy.
dt_biop is missing for patient B because they didn't have a biopsy.
(If I'm understanding your data correctly).
Did you want to add a new rule for how to handle these possibilities?
-Q.
What rule would you want when the event date is missing, or there are no events? I don't think I'm understanding your goal.
What output would you want for this sample data you shared?:
data have; input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ; format date mmddyy10.; datalines; A 09/01/2016 12345 04 2017 0 1 0 A 01/01/2017 12345 04 2017 0 1 0 A . 55700 04 2017 1 0 0 A . 55700 04 2017 1 0 0 B 09/02/2007 99999 08 2007 0 1 0 B 06/13/2007 99999 08 2007 0 1 0 B 07/28/2007 23456 08 2007 0 1 0 ; run;
1. When the event date is missing and I ran your code, the date of biopsy were all missing(.) No date was generated so I want to fix that by addressing the missing dates.
2. From the task the DT_MRI = date of MRI closest to but pre biopsy date NOT diagnosis date. How can I address this issue? Thanks
Hi,
Let's focus on DT_BIOP first. We agree that the sample code I shared will result in DT_BIOP being missing when the biopsy dates are all missing, or if there is no biopsy.
data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
;
run;
proc sort data=have ;
by pt_id date ;
run ;
data want (keep=pt_id DiagDate dt_biop dt_mri);
set have ;
by pt_id date ;
*Make a diagnosis date that is the last day of the month of diagnosis ;
*Its helpful to have a date variable ;
DiagDate=intnx('month',mdy(month_diag,1,year_diag),0,'e') ;
retain dt_biop dt_mri ;
if first.pt_id then do ;
call missing(dt_biop,dt_mri) ;
end ;
if .Z < date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
if mri=1 then dt_mri=date ;
end ;
if last.pt_id ;
format DiagDate dt_biop dt_mri date9. ;
run ;
proc print data=want ;
var pt_id DiagDate dt_biop ;
run ;
Results in:
Obs pt_id DiagDate dt_biop 1 A 30APR2017 . 2 B 31AUG2007 .
It sounds like you don't want dt_biop to be missing for these two patients.
What value would you want dt_biop to have for pt_Id A?
What value would you want dt_biop to have for pt_Id B?
@Quentin Yes, you are right.
What value would you want dt_biop to have for pt_Id A?
What value would you want dt_biop to have for pt_Id B?
For pt_Id A & B, I want the dt_biop to be DT_BIOP = date of biopsy closest to but <= DiagDate.
@CathyVI sorry, I don' think we're communicating well. You said:
@Quentin Yes, you are right.What value would you want dt_biop to have for pt_Id A?
What value would you want dt_biop to have for pt_Id B?
For pt_Id A & B, I want the dt_biop to be DT_BIOP = date of biopsy closest to but <= DiagDate.
You created the example data:
data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
;
run;
For that specific example data, what value would you want dt_biop to have for pt_Id A? what value would you want dt_biop to have for pt_Id B?
I think missing is the correct answer. But you seem to want a different answer. What specific value (date) would you want for pt_ID A and pt_ID B?
From the data it is true that missing date is there but for pt_Id A I want 01/01/2017 because this is the nearest date to the DiagDate.
For pt_Id B, I want 07/28/2007 because this is the closest/nearest date to or on the DiagDate.
Pt_ID A did not have a biopsy on 01/01/2017, they had an MRI on that date. You want to set Dt_Biop to the date of the MRI?
That logic needs to be clarified.
Are you saying that if a patient did not have any biopsies prior to the diagnosis date, then Dt_Biop should be set to the date of latest MRI prior to the diagnosis date? That seems confusing.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.