BookmarkSubscribeRSS Feed
TL93
Obsidian | Level 7

Hi SAS Community! Can someone help me create a time variable for survival analysis? My event/failure is incidence of cancer (i.e. the total population is at risk [in the sample] and individuals will drop out when they are first diagnosed with cancer [experience the event]).

 

I am using a merged dataset and the date of diagnosis comes from two different datasets. The first variable for date of diagnosis (VARD1) covers years 1991-2001. The second variable for date of diagnosis (VARD2) covers years 2002-2010. The responses are coded as the date of diagnosis, “YYYYMMDD.”

 

The data look like:

 

Record ID             VARD1            VARD2

1                           20000514

2                                                   20081204

3                                                   20030128

4                           19920416        20051118

5                           19980212

*respondent 4 had a cancer recurrence

 

How can I create a continuous time variable from these two date variables in order to conduct survival analysis? Also, I want time to be in years, so how can I make it so that the new variable is based on the year portion of the response (i.e. first four digits – YYYY)?

 

I would really appreciate any help I can get! Thank you so much!

 

Side note: After this, I would have to create an "event" variable that is =1 if they have been diagnosed with cancer (date present), and =0 if not (no date present). I am hoping that it will be pretty straight forward once I create the time variable. If this info helps you help me then great!!

4 REPLIES 4
ballardw
Super User

@TL93 wrote:

Hi SAS Community! Can someone help me create a time variable for survival analysis? My event/failure is incidence of cancer (i.e. the total population is at risk [in the sample] and individuals will drop out when they are first diagnosed with cancer [experience the event]).

 

I am using a merged dataset and the date of diagnosis comes from two different datasets. The first variable for date of diagnosis (VARD1) covers years 1991-2001. The second variable for date of diagnosis (VARD2) covers years 2002-2010. The responses are coded as the date of diagnosis, “YYYYMMDD.”

 

The data look like:

 

Record ID             VARD1            VARD2

1                           20000514

2                                                   20081204

3                                                   20030128

4                           19920416        20051118

5                           19980212

*respondent 4 had a cancer recurrence

 

How can I create a continuous time variable from these two date variables in order to conduct survival analysis? Also, I want time to be in years, so how can I make it so that the new variable is based on the year portion of the response (i.e. first four digits – YYYY)?

 

I would really appreciate any help I can get! Thank you so much!

 

Side note: After this, I would have to create an "event" variable that is =1 if they have been diagnosed with cancer (date present), and =0 if not (no date present). I am hoping that it will be pretty straight forward once I create the time variable. If this info helps you help me then great!!


First are your values actually SAS date values with a yymmddn8. format applied, character values or simple numeric? The approach would vary a bit between each of those three cases.

 

In the case of the reoccurrence which date do you want to use?

 

TL93
Obsidian | Level 7

Hi ballardw,

 

Thank you for your response!

 

I'm not entirely sure if this is a SAS date value. The responses are coded as "YYYYMMDD," format is $8., and it is a character value.

 

In the case of recurrence, I would like to use the first date (VARD1). (Unless there is a way I can use the first date to drop them out, but they re-enter the sample IF they have a second date (VARD2) and drop out again. This would be for another research question I have, which is modelling the probability that someone has a recurrence after the initial diagnosis. Now that I am thinking aloud, this latter part may not be feasible and I probably will have to censor my sample to those with an initial diagnosis to test for recurrence.)

ballardw
Super User

@TL93 wrote:

Hi ballardw,

 

Thank you for your response!

 

I'm not entirely sure if this is a SAS date value. The responses are coded as "YYYYMMDD," format is $8., and it is a character value.

 


The format indicates your current value is character. So your values should be turned into SAS date values for the analysis you want.

Here is one way. Note the data step to provide example data. The COALESCEC function returns the first non-missing value from a list of values, so will select vard1 when populated. the INPUT is used to read the character value returned as a date value using the yymmdd format. I applied a different date format so that you can see the conversion worked. To maintain your existing appearance you could use the format yymmddn8. The n in the format says not to add any / or - or similar character that the yymmdd format by default would and the 8 maintains the displayed length.

 

data have;
   infile datalines dlm=',' missover dsd;
   informat RecordID $5. VARD1 VARD2 $8.;
   input RecordID VARD1 VARD2 ;
datalines;
1,20000514, 
2,,20081204
3,,20030128
4,19920416,20051118
5,19980212,   
;
run;

data want;
   set have;
   vard = input(coalescec(vard1,vard2),yymmdd8.);
   format vard mmddyy10.;
run;

 

The reason I asked about your existing variable types is 1) to choose the correct function COALESCEC or COALESCE  2) the INPUT would be more complicated with numeric values as the INPUT would want a character value, requiring a bit of code to convert to appropriate character value with a PUT inside the INPUT section. Not complex but gets a tad ugly.

TL93
Obsidian | Level 7

Thank you very much for this syntax and a thorough explanation! I will try this when I am in my lab next week to see if it works. After the conversion, how would I be able to assess if the new variable is indeed a continuous time variable (before proceeding to my main analysis)? 

 

I am planning to maintain the existing appearance using this line for format, as you've advised

 

format vard yymmddn8;

So surely my data would not look any different.

 

Additionally, my existing date variable is coded as "YYYYMMDD," with the full 4 digits for year. Using yymmdd8. in the INPUT statement would not give me any problems, right?

 

Best,

Very appreciative and slightly less confused SAS user

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1198 views
  • 0 likes
  • 2 in conversation