BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Dissertator
Fluorite | Level 6

Hello everyone, 

 

I am newly working on longitudinal data and I have a question.

I am interested in the development of CVD. Follow-up began the day when a mother delivered a baby (infant_dob). Participants were either determined to have developed CVD (CVD_Overal) or censored at the time of death (DEATH_DATE_DC) or end of follow-up (1/1/2023), WHICEVER occurred first. 

 

I want to create two variables; censor and time. For the censoring variable, I want to create variable "censor": 1=yes, 0=no. For time, I want to create variable "Time" that will start from the baby's date of birth (infant_dob) until development of CVD (min_diag_dt) or censoring (death [DEATH_DATE_DC] or end of follow up (1/1/2023) in MONTHS. 

 

 

I greatly appreciate if you can help me create these variables. I have attached the data below.

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You already have a censor variable.  If CVD_OVERAL=1 then the disease was diagnosed on MIN_DIAG_DT, i.e. the participant was not censored prior to diagnosis.

 

If CVD_OVERAL=0 then the individual was censored, either at DEATH_DATE_DC or end-of-study (01jan2023).

 

If the proportional hazards procedure needs the censor variable to be 1 for censored, 0 otherwise, just subtract cvd_overal from 1.

 

As to time, you want the number of months from infant_dob to the minimum of three date variables: min_diag_dt, date of death, or 1/1/2023.

 

data want;
  set censor;

  censor=1-cvd_overal;

  death_date=input(death_date_dc,yymmdd10.);   
  format death_date yymmdd10.;

  time_in_months=intck('month',infant_dob,min(death_date,min_diag_dt,"01jan2023"d));
run;

The time_in_months variable is just the number of calendar-month-boundaries crossed (so a value of 1 could be just a single day, up to 31 days).  That is the default behavior of the INTCK function.  And of course, you should use 01jan2023 as end-of-study only if it was possible for there to be a diagnosis, or death, recorded for that date.  But if you really could not get a diagnosis later than 31dec2022, I suspect you should use that date as end-of-study.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

You already have a censor variable.  If CVD_OVERAL=1 then the disease was diagnosed on MIN_DIAG_DT, i.e. the participant was not censored prior to diagnosis.

 

If CVD_OVERAL=0 then the individual was censored, either at DEATH_DATE_DC or end-of-study (01jan2023).

 

If the proportional hazards procedure needs the censor variable to be 1 for censored, 0 otherwise, just subtract cvd_overal from 1.

 

As to time, you want the number of months from infant_dob to the minimum of three date variables: min_diag_dt, date of death, or 1/1/2023.

 

data want;
  set censor;

  censor=1-cvd_overal;

  death_date=input(death_date_dc,yymmdd10.);   
  format death_date yymmdd10.;

  time_in_months=intck('month',infant_dob,min(death_date,min_diag_dt,"01jan2023"d));
run;

The time_in_months variable is just the number of calendar-month-boundaries crossed (so a value of 1 could be just a single day, up to 31 days).  That is the default behavior of the INTCK function.  And of course, you should use 01jan2023 as end-of-study only if it was possible for there to be a diagnosis, or death, recorded for that date.  But if you really could not get a diagnosis later than 31dec2022, I suspect you should use that date as end-of-study.

 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Dissertator
Fluorite | Level 6

Thank you for your prompt response. It works.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 773 views
  • 0 likes
  • 2 in conversation