BookmarkSubscribeRSS Feed
CathyVI
Pyrite | Level 9

Hello,

I have a large dataset but i have provided a sample of the data, my coding and expected output. My code is not giving me the expected result so I need guidance.

I am working on dates variable, I want to generate the date of mri and date of biopsy.Here is a sample:

 

data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A 03/01/2017 55700 04 2017 1 0 0

A 03/05/2017 55700 04 2017 1 0 0
;
run;

 

1. When biop=1, I want to generate the date_biops closest to but pre/in diagnosis month/year i.e the date of biosp will be the last date closest to the diagnosis month/year

2. When mri=1, I want to generate the date of mri closest to but pre biopsy date. 

 

This is how I want my output to look like.

pt_idMonth_diagyear_diagdt_biopdt_mri
AApril20173/5/20171/1/2017

 

This is my attempted code:

Data want;
set have;
Date_code = catx('/',year_diag,Month_diag);
by pt_id;
first_biop=first.biop;
last_biop=last.biop;

retain DT_BIOP;
if first.biop then call missing (DT_BIOP);

if date <= Date_code and not missing(BIOP) then DT_BIOP=date;
else if missing(DT_BIOP) then biop_2 =0;

format DT_BIOP yymmdd10.;

run;

Thank you in advance

25 REPLIES 25
Quentin
Super User

Hi,

 

Thanks for taking the time to post a clear question, with a description, sample data, and the code you have tried.

 

This code doesn't give you exactly the output you want, but I think the logic should work for you.

 

The logic is:

  • Data are sorted by patient and encounter date
  • Create a diagnosis date (last day of the diagnosis month).
  • For each patient, read through the encounters (in chronological order).  If an encounter date is before the diagnosis date, then set biopsy date or mri date to the encounter date.
  • Output one record per patient.
data want (keep=pt_id DiagDate dt_biop dt_mri);
  set have ;
  by pt_id date ;

  *Make a diagnosis date that is the last day of the month of diagnosis ;
  *Its helpful to have a date variable ;

  DiagDate=intnx('month',mdy(month_diag,1,year_diag),0,'e') ;

  retain dt_biop dt_mri ;

  if first.pt_id then do ;
    call missing(dt_biop,dt_mri) ;
  end ;

  if date <= DiagDate then do ;
    if biop=1 then dt_biop=date ;
    if mri=1  then dt_mri=date ;
  end ;

  if last.pt_id ;

  format DiagDate dt_biop dt_mri date9. ;
run ;
The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
Modeller
Fluorite | Level 6

Below code gives you the output as required -

 

*sort data by patient id and date;
proc sort data= have; by pt_id date; run;

*create temporary table to format date_code & assign first and last biop indicators;
data _temp_want1_;
set have;
by pt_id date;
format date date9.;

if Month_diag <= 9 then
do;
_temp_Date_code = catx('/',('0'||strip(substr((put(Month_diag,$2.)),1,2))),'01',year_diag);
end;
else if Month_diag > 9 then
do;
_temp_Date_code = catx('/',(strip(substr((put(Month_diag,$2.)),1,2))),'01',year_diag);
end;

_temp_Date_code2 = input(_temp_Date_code, MMDDYY10.);
Date_code = _temp_Date_code2;
format Date_code date9.;
drop _temp_Date_code _temp_Date_code2;

If first.pt_id then
do;
first_biop = biop;
end;

If last.pt_id then
do;
last_biop = biop;
end;
run;

*inner table: calculate latest date pre diagnosis date to use in dt_biop assignment;
*outer table t2: join inner table to first temporary table to assign temporary dt_biop;
proc sql;
create table _temp_want_biop_dte_join as
select t1.*
, t2. temp_dt_biop
from _temp_want1_ t1
left join (select pt_id
, date
, case
when max(date) = date and date < date_code then date
else . end as temp_dt_biop format=date9.
from _temp_want1_
group by pt_id
having biop = 1 and max(date) = date) t2
on t1.pt_id = t2.pt_id
and t1.date = t2.date;
quit;

*assign temporary dt_biopsy across all rows and rename to the final dt_biopsy field;
proc sql;
create table _temp_want_biop_dte_join_max as
select *
, max(temp_dt_biop) as dt_biop format=date9.
from _temp_want_biop_dte_join
group by pt_id;
quit;

*inner table t1: select all data where mri = 1
*inner table t1: from table t1, group by patient id to select the latest date where mri = 1 and date is less than biopsy date;
*outer table: select only patient id, date and temporary mri date to join on to full table on the next steps;
proc sql;
create table _temp_want_mri_dte as
select t2.pt_id
, t2.date
, t2.temp_dt_mri
from (select t1.*
, case
when max(t1.date) = date and date < dt_biop then date
else . end as temp_dt_mri format=date9.
from (select *
from _temp_want_biop_dte_join_max
where mri = 1) t1
group by t1.pt_id) t2
where t2.temp_dt_mri = t2.date;
quit;

*join temprary mri date onto full data for the same date as where mri = 1 & date is less than biopsy date;
proc sql;
create table _temp_want_mri_dte_join as
select t1.*
, t2.temp_dt_mri
from _temp_want_biop_dte_join_max t1
left join _temp_want_mri_dte t2
on t1.pt_id = t2.pt_id
and t1.date = t2.date;
quit;

*fill in temporary mri date across all rows in full data;
proc sql;
create table _Temp_want_max_mri_dte as
select *
, max(temp_dt_mri) as dt_mri format=date9.
from _temp_want_mri_dte_join
group by pt_id;
quit;

*select single row per patient id with required fields & formats;
proc sql;
create table want as
select distinct pt_id
, put(date_code, monname3.) as month_diag
, year(date_code) as year_diag
, dt_biop format=MMDDYY10.
, dt_mri format=MMDDYY10.
from _Temp_want_max_mri_dte;
quit;

/*delete temporary tables*/
proc sql;
drop table _temp_want1_;
drop table _temp_want_biop_dte_join;
drop table _temp_want_biop_dte_join_max;
drop table _temp_want_mri_dte;
drop table _temp_want_mri_dte_join;
drop table _Temp_want_max_mri_dte;
quit;

CathyVI
Pyrite | Level 9

@Quentin @Modeller 

One thing I forgot to mention is that I have missing observations within the date variables. When I ran the code the new variables dt_biop and dt_mri was missing. How do I resolved the issue?

Quentin
Super User

Do you mean you have cases where the variable DATE is missing?  For my code, you could change:

 

  if date <= DiagDate then do ;
    if biop=1 then dt_biop=date ;
    if mri=1  then dt_mri=date ;
  end ;

to:

 

  if .Z < date <= DiagDate then do ;
    if biop=1 then dt_biop=date ;
    if mri=1  then dt_mri=date ;
  end ;

That will ignore events where the DATE is missing.

 

Unrelated, I just noticed that your specification is:

1. When biop=1, I want to generate the date_biops closest to but pre/in diagnosis month/year

2. When mri=1, I want to generate the date of mri closest to but pre/in biopsy date.

 

I didn't do #2 like you want.  I generated date_mri as the date closest to but pre/in the diagnosis month. 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
CathyVI
Pyrite | Level 9

@Quentin @Modeller 

I recreated the file with missing data but when I ran your new code the out was still missing.

data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
;
run;

 

proc sort data = have; by pt_ID date ;proc print; run;
data want (keep=pt_id DiagDate dt_biop dt_mri);
set have ;
by pt_id date ;

*Make a diagnosis date that is the last day of the month of diagnosis ;
*Its helpful to have a date variable ;

DiagDate= mdy(month_diag,15,year_diag) ;

retain dt_biop dt_mri ;

if first.pt_id then do ;
call missing(dt_biop,dt_mri) ;
end ;

if .Z < date <= DiagDate then do ;
if biop=1 then dt_biop=date ;
if mri=1 then dt_mri=date ;
end ;

if last.pt_id ;

format DiagDate dt_biop dt_mri date9. ;
run ;

 

Quentin
Super User

Hi, 

dt_biop is missing for patient A because even though the patient had two biopsies, you don't know the date of the biopsy.

dt_biop is missing for patient B because they didn't have a biopsy.

 

(If I'm understanding your data correctly).

 

Did you want to add a new rule for how to handle these possibilities?

 

-Q.

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
CathyVI
Pyrite | Level 9

@Quentin 

Yes,  I want a rule for how to handle the missingness. I dont want to delete them

 

Quentin
Super User

What rule would you want when the event date is missing, or there are no events?  I don't think I'm understanding your goal.

 

What output would you want for this sample data you shared?:

data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
;
run;
The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
CathyVI
Pyrite | Level 9

@Quentin 

1. When the event date is missing and I ran your code, the date of biopsy were all missing(.) No date was generated so I want to fix that by addressing the missing dates.

2. From the task the DT_MRI = date of MRI closest to but pre biopsy date NOT diagnosis date. How can I address this issue? Thanks 

Quentin
Super User

Hi,

 

Let's focus on DT_BIOP first.  We agree that the sample code I shared will result in DT_BIOP being missing when the biopsy dates are all missing, or if there is no biopsy.

 

data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
;
run;

proc sort data=have ;
  by pt_id date ;
run ;

data want (keep=pt_id DiagDate dt_biop dt_mri);
  set have ;
  by pt_id date ;

  *Make a diagnosis date that is the last day of the month of diagnosis ;
  *Its helpful to have a date variable ;

  DiagDate=intnx('month',mdy(month_diag,1,year_diag),0,'e') ;

  retain dt_biop dt_mri ;

  if first.pt_id then do ;
    call missing(dt_biop,dt_mri) ;
  end ;

  if .Z < date <= DiagDate then do ;
    if biop=1 then dt_biop=date ;
    if mri=1  then dt_mri=date ;
  end ;

  if last.pt_id ;

  format DiagDate dt_biop dt_mri date9. ;
run ;

proc print data=want ;
  var pt_id DiagDate dt_biop ;
run ;

Results in:

Obs    pt_id     DiagDate      dt_biop

 1       A      30APR2017            .
 2       B      31AUG2007            .

It sounds like you don't want dt_biop to be missing for these two patients.

 

What value would you want dt_biop to have for pt_Id A?

What value would you want dt_biop to have for pt_Id B?

 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
CathyVI
Pyrite | Level 9

@Quentin Yes, you are right.

What value would you want dt_biop to have for pt_Id A?

What value would you want dt_biop to have for pt_Id B?

For pt_Id A & B, I want the dt_biop to be  DT_BIOP = date of biopsy closest to but <= DiagDate.

 

Quentin
Super User

@CathyVI sorry, I don' think we're communicating well.  You said:


@Quentin Yes, you are right.

What value would you want dt_biop to have for pt_Id A?

What value would you want dt_biop to have for pt_Id B?

For pt_Id A & B, I want the dt_biop to be  DT_BIOP = date of biopsy closest to but <= DiagDate.

 

You created the example data:

 

data have;
input pt_id $ date:mmddyy10. HCPCS month_diag year_diag BIOP MRI MRI_BIOP ;
format date mmddyy10.;
datalines;
A 09/01/2016 12345 04 2017 0 1 0
A 01/01/2017 12345 04 2017 0 1 0
A . 55700 04 2017 1 0 0
A . 55700 04 2017 1 0 0
B 09/02/2007 99999 08 2007 0 1 0
B 06/13/2007 99999 08 2007 0 1 0
B 07/28/2007 23456 08 2007 0 1 0
;
run;

For that specific example data, what value would you want dt_biop to have for pt_Id A?  what value would you want dt_biop to have for pt_Id B?

 

I think missing is the correct answer.  But you seem to want a different answer.  What specific value (date) would you want for pt_ID A and pt_ID B?


 

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
CathyVI
Pyrite | Level 9

@Quentin 

From the data it is true that missing date is there but for pt_Id A  I want 01/01/2017 because this is the nearest date to the DiagDate.

For pt_Id B, I want 07/28/2007 because this is the closest/nearest date to or on the DiagDate.

 

 

Quentin
Super User

Pt_ID A did not have a biopsy on 01/01/2017, they had an MRI on that date.  You want to set Dt_Biop to the date of the MRI?

 

That logic needs to be clarified.

 

Are you saying that if a patient did not have any biopsies prior to the diagnosis date, then Dt_Biop should be set to the date of latest MRI prior to the diagnosis date?  That seems confusing.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 2357 views
  • 0 likes
  • 3 in conversation