@Quentin Am sorry to confuse you. You are right initially.
Please this is the right decision.
When biops =1, that is if a patient have any biopsies prior to the diagnosis date, then Dt_Biop should be set to the latest visit date prior to the diagnosis date.
@CathyVI wrote:
@Quentin Am sorry to confuse you. You are right initially.
Please this is the right decision.
When biops =1, that is if a patient have any biopsies prior to the diagnosis date, then Dt_Biop should be set to the latest visit date prior to the diagnosis date.
Okay, so PT_ID B has no biopsies. The patient has no records with Biops=1. What value would you want for Dt_Biop?
PT_ID A has two biopsies, but you don't know the dates of the biopsies. What value would you want for Dt_Biop?
In this scenario, PT_ID A will be missing because no records with Biops=1. However, when I ran it in the larger data, I would want value for Dt_Biop when Biops =1.
Ok, then is the code I posted giving you the correct result you want for Dt_Biop?
@Quentin No its not. The entire dt_Biop is missing even for Dt_Biop B
Sorry, I'm still confused. You've provided sample data for three patients. I included all that data here, and the code is working as I would expect. Patient A and Patient B have dt_biop missing. Patient C has dt_biop set to 05Mar2017, as per your original post in this thread.
data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
C 09/01/2016 12345 04 2017 0 1 0
C 01/01/2017 12345 04 2017 0 1 0
C 03/01/2017 55700 04 2017 1 0 0
C 03/05/2017 55700 04 2017 1 0 0
;
run;
proc sort data=have ;
by pt_id date ;
run ;
data want (keep=pt_id DiagDate dt_biop);
set have ;
by pt_id date ;
*Make a diagnosis date that is the last day of the month of diagnosis ;
*Its helpful to have a date variable ;
DiagDate=intnx('month',mdy(month_diag,1,year_diag),0,'e') ;
retain dt_biop ;
if first.pt_id then do ;
call missing(dt_biop) ;
end ;
if .Z < date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
end ;
if last.pt_id ;
format DiagDate dt_biop date9. ;
run ;
proc print data=want ;
var pt_id DiagDate dt_biop ;
run ;
Returns:
Obs pt_id DiagDate dt_biop 1 A 30APR2017 . 2 B 31AUG2007 . 3 C 30APR2017 05MAR2017
I think we agree that the above results are correct for these three patients.
If you have examples of data where this code doesn't work to calculate dt_biop, please add the examples to the sample data above, and reply to his with the full code (including the example data) and result, just as I have in this post.
When you post the code, please be sure to click the "Insert SAS Code" button, so that code will be formatted appropriately.
@Quentin Am very grateful for your help. Let me start over with this new posting so to be more clear & detailed. Please let me know if this helps.
Hi,
Sorry this is a long text, am just trying to be well detailed.
I am trying to determine two dates from a dataset.
1. BIOPSY date which is define as the closest date to but pre DIAGNOSIS date
2. MRI date which is define as the closest to but pre BIOPSY date.
I was able to generate the BIOPSY date but I am struggling to get the MRI date. I have variable BIOP, MRI & MRI_BIOP. These are variables I have generated when BIOP=1 or MRI=1 or MRI_BIOP=1.
data have;
input pt_id $ Pdate:mmddyy10. month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 04 2017 0 1 0
A 01/01/2017 04 2017 0 1 0
A . 04 2017 1 0 0
A 03/25/2017 04 2017 1 0 0
B 09/02/2007 08 2007 1 0 0
B 06/15/2007 08 2007 0 1 0
B 06/13/2007 08 2007 0 1 0
B 07/28/2007 08 2007 1 0 0
;
run;
proc sort data = have; by pt_ID Pdate ;proc print; run;
data want ;
set have ;
by pt_id date ;
/*Make a diagnosis date that is the 15thday of the month of diagnosis */
DiagDate = mdy(month_diag,15,year_diag) ;
retain dt_biop dt_mri ;
if first.pt_id then do ;
call missing(dt_biop,dt_mri) ;
end ;
if Pdate = . then dt_biop=. ;
else if Pdate <= DiagDate & biop=1 then dt_biop=Pdate ;
if mri=1 & Pdate <= dt_biop then dt_mri=Pdate ;
if last.pt_id ;
format DiagDate dt_biop dt_mri date9. ;
run ;
OUTPUT I WANTED
For example in Pt_id A, 03/25/2017(procedure date) is the closet date/pre to 08/15/2007(diagnosis date) when BIOP=1 so this is the BIOPSY date. The next date closet & pre to 03/25/2017 is 01/01/2017 when MRI=1, so this is suppose to be the MRI date.
In Pt_id B, 07/28/2007 is the closet & pre to 08/15/2007when BIOP=1 so this should be BIOPSY date then 06/15/2007 is the closet &pre 07/28/2007 when MRI=1 so this should be the MRI date.
Sounds like you're happy with your adaption of the code I provided to calculate dt_biop.
I would suggest you approach this in multiple steps:
1. Calculate dt_biop for each patient (using the code you have). This dataset has one record per patient.
2. Merge dt_biop back onto your HAVE data. This dataset has one record per procedure, and now has DT_BIOP.
3. Calculate dt_mri. The logic to calculate dt_MRI is the same as the logic for dt_biop. The data step to calculate dt_biop compared each biopsy date to the diagnosis date, to find the latest biopsy that was prior to the diagnosis date. The data step to calculate dt_MRI would compare each MRI date to DT_BIOP, to find the latest MRI that was prior to DT_BIOP.
While it would be possible to calculate both DT_Biop and DT_MRI in one step, as you are trying to do, the code would be more complex.
Here is an example of a one data step approach. As I said, it's more complex method known as the 'DOW loop'. It uses a single DATA step to read the data twice. I think you'll be better off by using separate steps to calculate DT_BIOP and DT_MRI.
data have;
input pt_id $ date:mmddyy10. month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 04 2017 0 1 0
A 01/01/2017 04 2017 0 1 0
A . 04 2017 1 0 0
A . 04 2017 1 0 0
B 09/02/2007 08 2007 0 1 0
B 06/13/2007 08 2007 0 1 0
B 07/28/2007 08 2007 0 1 0
C 09/01/2016 04 2017 0 1 0
C 01/01/2017 04 2017 0 1 0
C 03/01/2017 04 2017 1 0 0
C 03/05/2017 04 2017 1 0 0
D 09/01/2016 04 2017 0 1 0
D 01/01/2017 04 2017 0 1 0
D . 04 2017 1 0 0
D 03/25/2017 04 2017 1 0 0
E 09/02/2007 08 2007 1 0 0
E 06/15/2007 08 2007 0 1 0
E 06/13/2007 08 2007 0 1 0
E 07/28/2007 08 2007 1 0 0
;
run;
proc sort data=have ;
by pt_id date ;
run ;
data want (keep=pt_id dt_biop dt_mri) ;;
do until (last.pt_id) ;
set have ;
by pt_id date ;
DiagDate = mdy(month_diag,15,year_diag) ;
if biop=1 and .Z < date <= DiagDate then dt_biop=date ;
end ;
do until (last.pt_id) ;
set have ;
by pt_id date ;
if MRI=1 and .Z < date <= dt_biop then dt_mri=date ;
end ;
output ;
format DiagDate dt_biop dt_mri date9. ;
run ;
proc print data=want ;
run ;
While walking the dog, I realized that a simpler one-step approach would be to sort the data for each patient in descending chronological order. This this logic might be easier to understand than the DOW loop approach:
data have;
input pt_id $ date:mmddyy10. month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 04 2017 0 1 0
A 01/01/2017 04 2017 0 1 0
A . 04 2017 1 0 0
A . 04 2017 1 0 0
B 09/02/2007 08 2007 0 1 0
B 06/13/2007 08 2007 0 1 0
B 07/28/2007 08 2007 0 1 0
C 09/01/2016 04 2017 0 1 0
C 01/01/2017 04 2017 0 1 0
C 03/01/2017 04 2017 1 0 0
C 03/05/2017 04 2017 1 0 0
D 09/01/2016 04 2017 0 1 0
D 01/01/2017 04 2017 0 1 0
D . 04 2017 1 0 0
D 03/25/2017 04 2017 1 0 0
E 09/02/2007 08 2007 1 0 0
E 06/15/2007 08 2007 0 1 0
E 06/13/2007 08 2007 0 1 0
E 07/28/2007 08 2007 1 0 0
;
run;
proc sort data=have ;
by pt_id descending date ;
run ;
data want (keep=pt_id DiagDate dt_biop dt_mri);
set have ;
by pt_id descending date ;
DiagDate = mdy(month_diag,15,year_diag) ;
retain dt_biop dt_mri _foundbiop _foundmri;
if first.pt_id then do ;
call missing(dt_biop,dt_mri,_foundbiop,_foundmri) ;
end ;
if .Z < date <= DiagDate then do ;
if biop=1 and not (_foundbiop) then do ;
dt_biop=date ;
_foundbiop=1 ;
end ;
if mri=1 and _foundbiop=1 and not (_foundmri) then do ;
dt_mri=date ;
_foundmri=1 ;
end ;
end ;
if last.pt_id ;
drop _: ;
format DiagDate dt_biop dt_mri date9. ;
run ;
proc print data=want ;
var pt_id DiagDate dt_biop dt_mri;
run ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.