@Quentin Am sorry to confuse you. You are right initially.
Please this is the right decision.
When biops =1, that is if a patient have any biopsies prior to the diagnosis date, then Dt_Biop should be set to the latest visit date prior to the diagnosis date.
@CathyVI wrote:
@Quentin Am sorry to confuse you. You are right initially.
Please this is the right decision.
When biops =1, that is if a patient have any biopsies prior to the diagnosis date, then Dt_Biop should be set to the latest visit date prior to the diagnosis date.
Okay, so PT_ID B has no biopsies. The patient has no records with Biops=1. What value would you want for Dt_Biop?
PT_ID A has two biopsies, but you don't know the dates of the biopsies. What value would you want for Dt_Biop?
In this scenario, PT_ID A will be missing because no records with Biops=1. However, when I ran it in the larger data, I would want value for Dt_Biop when Biops =1.
Ok, then is the code I posted giving you the correct result you want for Dt_Biop?
@Quentin No its not. The entire dt_Biop is missing even for Dt_Biop B
Sorry, I'm still confused. You've provided sample data for three patients. I included all that data here, and the code is working as I would expect. Patient A and Patient B have dt_biop missing. Patient C has dt_biop set to 05Mar2017, as per your original post in this thread.
data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
C 09/01/2016 12345 04 2017 0 1 0
C 01/01/2017 12345 04 2017 0 1 0
C 03/01/2017 55700 04 2017 1 0 0
C 03/05/2017 55700 04 2017 1 0 0
;
run;
proc sort data=have ;
by pt_id date ;
run ;
data want (keep=pt_id DiagDate dt_biop);
set have ;
by pt_id date ;
*Make a diagnosis date that is the last day of the month of diagnosis ;
*Its helpful to have a date variable ;
DiagDate=intnx('month',mdy(month_diag,1,year_diag),0,'e') ;
retain dt_biop ;
if first.pt_id then do ;
call missing(dt_biop) ;
end ;
if .Z < date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
end ;
if last.pt_id ;
format DiagDate dt_biop date9. ;
run ;
proc print data=want ;
var pt_id DiagDate dt_biop ;
run ;
Returns:
Obs pt_id DiagDate dt_biop 1 A 30APR2017 . 2 B 31AUG2007 . 3 C 30APR2017 05MAR2017
I think we agree that the above results are correct for these three patients.
If you have examples of data where this code doesn't work to calculate dt_biop, please add the examples to the sample data above, and reply to his with the full code (including the example data) and result, just as I have in this post.
When you post the code, please be sure to click the "Insert SAS Code" button, so that code will be formatted appropriately.
@Quentin Am very grateful for your help. Let me start over with this new posting so to be more clear & detailed. Please let me know if this helps.
Hi,
Sorry this is a long text, am just trying to be well detailed.
I am trying to determine two dates from a dataset.
1. BIOPSY date which is define as the closest date to but pre DIAGNOSIS date
2. MRI date which is define as the closest to but pre BIOPSY date.
I was able to generate the BIOPSY date but I am struggling to get the MRI date. I have variable BIOP, MRI & MRI_BIOP. These are variables I have generated when BIOP=1 or MRI=1 or MRI_BIOP=1.
data have;
input pt_id $ Pdate:mmddyy10. month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 04 2017 0 1 0
A 01/01/2017 04 2017 0 1 0
A . 04 2017 1 0 0
A 03/25/2017 04 2017 1 0 0
B 09/02/2007 08 2007 1 0 0
B 06/15/2007 08 2007 0 1 0
B 06/13/2007 08 2007 0 1 0
B 07/28/2007 08 2007 1 0 0
;
run;
proc sort data = have; by pt_ID Pdate ;proc print; run;
data want ;
set have ;
by pt_id date ;
/*Make a diagnosis date that is the 15thday of the month of diagnosis */
DiagDate = mdy(month_diag,15,year_diag) ;
retain dt_biop dt_mri ;
if first.pt_id then do ;
call missing(dt_biop,dt_mri) ;
end ;
if Pdate = . then dt_biop=. ;
else if Pdate <= DiagDate & biop=1 then dt_biop=Pdate ;
if mri=1 & Pdate <= dt_biop then dt_mri=Pdate ;
if last.pt_id ;
format DiagDate dt_biop dt_mri date9. ;
run ;
OUTPUT I WANTED
For example in Pt_id A, 03/25/2017(procedure date) is the closet date/pre to 08/15/2007(diagnosis date) when BIOP=1 so this is the BIOPSY date. The next date closet & pre to 03/25/2017 is 01/01/2017 when MRI=1, so this is suppose to be the MRI date.
In Pt_id B, 07/28/2007 is the closet & pre to 08/15/2007when BIOP=1 so this should be BIOPSY date then 06/15/2007 is the closet &pre 07/28/2007 when MRI=1 so this should be the MRI date.
Sounds like you're happy with your adaption of the code I provided to calculate dt_biop.
I would suggest you approach this in multiple steps:
1. Calculate dt_biop for each patient (using the code you have). This dataset has one record per patient.
2. Merge dt_biop back onto your HAVE data. This dataset has one record per procedure, and now has DT_BIOP.
3. Calculate dt_mri. The logic to calculate dt_MRI is the same as the logic for dt_biop. The data step to calculate dt_biop compared each biopsy date to the diagnosis date, to find the latest biopsy that was prior to the diagnosis date. The data step to calculate dt_MRI would compare each MRI date to DT_BIOP, to find the latest MRI that was prior to DT_BIOP.
While it would be possible to calculate both DT_Biop and DT_MRI in one step, as you are trying to do, the code would be more complex.
Here is an example of a one data step approach. As I said, it's more complex method known as the 'DOW loop'. It uses a single DATA step to read the data twice. I think you'll be better off by using separate steps to calculate DT_BIOP and DT_MRI.
data have;
input pt_id $ date:mmddyy10. month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 04 2017 0 1 0
A 01/01/2017 04 2017 0 1 0
A . 04 2017 1 0 0
A . 04 2017 1 0 0
B 09/02/2007 08 2007 0 1 0
B 06/13/2007 08 2007 0 1 0
B 07/28/2007 08 2007 0 1 0
C 09/01/2016 04 2017 0 1 0
C 01/01/2017 04 2017 0 1 0
C 03/01/2017 04 2017 1 0 0
C 03/05/2017 04 2017 1 0 0
D 09/01/2016 04 2017 0 1 0
D 01/01/2017 04 2017 0 1 0
D . 04 2017 1 0 0
D 03/25/2017 04 2017 1 0 0
E 09/02/2007 08 2007 1 0 0
E 06/15/2007 08 2007 0 1 0
E 06/13/2007 08 2007 0 1 0
E 07/28/2007 08 2007 1 0 0
;
run;
proc sort data=have ;
by pt_id date ;
run ;
data want (keep=pt_id dt_biop dt_mri) ;;
do until (last.pt_id) ;
set have ;
by pt_id date ;
DiagDate = mdy(month_diag,15,year_diag) ;
if biop=1 and .Z < date <= DiagDate then dt_biop=date ;
end ;
do until (last.pt_id) ;
set have ;
by pt_id date ;
if MRI=1 and .Z < date <= dt_biop then dt_mri=date ;
end ;
output ;
format DiagDate dt_biop dt_mri date9. ;
run ;
proc print data=want ;
run ;
While walking the dog, I realized that a simpler one-step approach would be to sort the data for each patient in descending chronological order. This this logic might be easier to understand than the DOW loop approach:
data have;
input pt_id $ date:mmddyy10. month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 04 2017 0 1 0
A 01/01/2017 04 2017 0 1 0
A . 04 2017 1 0 0
A . 04 2017 1 0 0
B 09/02/2007 08 2007 0 1 0
B 06/13/2007 08 2007 0 1 0
B 07/28/2007 08 2007 0 1 0
C 09/01/2016 04 2017 0 1 0
C 01/01/2017 04 2017 0 1 0
C 03/01/2017 04 2017 1 0 0
C 03/05/2017 04 2017 1 0 0
D 09/01/2016 04 2017 0 1 0
D 01/01/2017 04 2017 0 1 0
D . 04 2017 1 0 0
D 03/25/2017 04 2017 1 0 0
E 09/02/2007 08 2007 1 0 0
E 06/15/2007 08 2007 0 1 0
E 06/13/2007 08 2007 0 1 0
E 07/28/2007 08 2007 1 0 0
;
run;
proc sort data=have ;
by pt_id descending date ;
run ;
data want (keep=pt_id DiagDate dt_biop dt_mri);
set have ;
by pt_id descending date ;
DiagDate = mdy(month_diag,15,year_diag) ;
retain dt_biop dt_mri _foundbiop _foundmri;
if first.pt_id then do ;
call missing(dt_biop,dt_mri,_foundbiop,_foundmri) ;
end ;
if .Z < date <= DiagDate then do ;
if biop=1 and not (_foundbiop) then do ;
dt_biop=date ;
_foundbiop=1 ;
end ;
if mri=1 and _foundbiop=1 and not (_foundmri) then do ;
dt_mri=date ;
_foundmri=1 ;
end ;
end ;
if last.pt_id ;
drop _: ;
format DiagDate dt_biop dt_mri date9. ;
run ;
proc print data=want ;
var pt_id DiagDate dt_biop dt_mri;
run ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.