Hello everyone,
I am newly working on longitudinal data and I have a question.
I am interested in the development of CVD. Follow-up began the day when a mother delivered a baby (infant_dob). Participants were either determined to have developed CVD (CVD_Overal) or censored at the time of death (DEATH_DATE_DC) or end of follow-up (1/1/2023), WHICEVER occurred first.
I want to create two variables; censor and time. For the censoring variable, I want to create variable "censor": 1=yes, 0=no. For time, I want to create variable "Time" that will start from the baby's date of birth (infant_dob) until development of CVD (min_diag_dt) or censoring (death [DEATH_DATE_DC] or end of follow up (1/1/2023) in MONTHS.
I greatly appreciate if you can help me create these variables. I have attached the data below.
You already have a censor variable. If CVD_OVERAL=1 then the disease was diagnosed on MIN_DIAG_DT, i.e. the participant was not censored prior to diagnosis.
If CVD_OVERAL=0 then the individual was censored, either at DEATH_DATE_DC or end-of-study (01jan2023).
If the proportional hazards procedure needs the censor variable to be 1 for censored, 0 otherwise, just subtract cvd_overal from 1.
As to time, you want the number of months from infant_dob to the minimum of three date variables: min_diag_dt, date of death, or 1/1/2023.
data want;
set censor;
censor=1-cvd_overal;
death_date=input(death_date_dc,yymmdd10.);
format death_date yymmdd10.;
time_in_months=intck('month',infant_dob,min(death_date,min_diag_dt,"01jan2023"d));
run;
The time_in_months variable is just the number of calendar-month-boundaries crossed (so a value of 1 could be just a single day, up to 31 days). That is the default behavior of the INTCK function. And of course, you should use 01jan2023 as end-of-study only if it was possible for there to be a diagnosis, or death, recorded for that date. But if you really could not get a diagnosis later than 31dec2022, I suspect you should use that date as end-of-study.
You already have a censor variable. If CVD_OVERAL=1 then the disease was diagnosed on MIN_DIAG_DT, i.e. the participant was not censored prior to diagnosis.
If CVD_OVERAL=0 then the individual was censored, either at DEATH_DATE_DC or end-of-study (01jan2023).
If the proportional hazards procedure needs the censor variable to be 1 for censored, 0 otherwise, just subtract cvd_overal from 1.
As to time, you want the number of months from infant_dob to the minimum of three date variables: min_diag_dt, date of death, or 1/1/2023.
data want;
set censor;
censor=1-cvd_overal;
death_date=input(death_date_dc,yymmdd10.);
format death_date yymmdd10.;
time_in_months=intck('month',infant_dob,min(death_date,min_diag_dt,"01jan2023"d));
run;
The time_in_months variable is just the number of calendar-month-boundaries crossed (so a value of 1 could be just a single day, up to 31 days). That is the default behavior of the INTCK function. And of course, you should use 01jan2023 as end-of-study only if it was possible for there to be a diagnosis, or death, recorded for that date. But if you really could not get a diagnosis later than 31dec2022, I suspect you should use that date as end-of-study.
Thank you for your prompt response. It works.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.