BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RAGC
Obsidian | Level 7

Hello everyone,

 

I am working on a discrete-time survival analysis, and I have quite a bit of missing information on the outcome at various time points. To resolve this, I would like to create a variable that corresponds to the number of missing time points (i.e. the different intervals) to include as a covariate in the analysis.  My apologies if my explanation is not that clear, but I have attached two tables that provide a visual example of what I mean. Table 1 is an example of how the dataset currently looks, and table 2 represents an example of how I would like it to look once it has been transformed appropriately. 

 

I can get the basic array that would restructure the data from long to wide:

 

DATA long;
SET wide;
DO Time=1 TO duration;
IF Time=duration AND event=1 THEN death=1;
ELSE death=0;
OUTPUT;
END;
RUN; 

 

It is just adding in that variable that corresponds to the exact number that were missing in between, and then flagging it as dying or not dying at that final time period where there is data. 

 

Any suggestions would be much appreciated!

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

For each incoming record you want to output as many records as there are 0's or 1's in the DEATH1-DEATH6 variables.  So if there are three such values, you want to output a record for timeptn=1, timeptn=2, and timeptn=3.

 

And for each output record the interval_length is just the count of columns between the current death=0/1 and the preceding death=0/1.  Or for timeptn=1 it's just the position of the 0/1 value in the list of death variables.  The sum of interval_length values should equal duration:

 

data have;
  infile datalines missover;
  input ID Duration Event Death1 Death2 Death3 Death4 Death5 Death6;
datalines;
1 3 1 . 0 1  
2 5 0 0 0 . . 0   
3 3 0 . . 0       
4 1 1 1           
5 6 0 . . . . . 0
run;
 
data want (drop=t);
  set have;
  array dth {6} death1-death6;
  timeptn=1;
  do t=1 to duration;
    if dth{t}=0 or dth{t}=1 then do;
      interval_length=ifn(timeptn=1,t,dif(t));
      death=dth{t};
      output;
      timeptn=timeptn+1;
    end;
  end;
run;

 

The interesting part is within the "if dth{t}=0 or dth{t}=1 then do" group.

 

  1. It calculates interval_length as either the value for t (if timeptn=1) or otherwise the difference between the current t and the most recent t for a death=0 or 1.  The DIF function is defined as   dif(x)=x-lag(x).
  2. The death dummy assignment is obvious.
  3. Then the record is output and timeptn is incremented
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
Reeza
Super User

Is interval length what you want to calculate? If so, how is that being calculated. To me, for ID 1 they should be the same?

 

PS Please post tables and such directly into the forum, attachments are a pain to work with.

 


@RAGC wrote:

Hello everyone,

 

I am working on a discrete-time survival analysis, and I have quite a bit of missing information on the outcome at various time points. To resolve this, I would like to create a variable that corresponds to the number of missing time points (i.e. the different intervals) to include as a covariate in the analysis.  My apologies if my explanation is not that clear, but I have attached two tables that provide a visual example of what I mean. Table 1 is an example of how the dataset currently looks, and table 2 represents an example of how I would like it to look once it has been transformed appropriately. 

 

I can get the basic array that would restructure the data from long to wide:

 

DATA long;
SET wide;
DO Time=1 TO duration;
IF Time=duration AND event=1 THEN death=1;
ELSE death=0;
OUTPUT;
END;
RUN; 

 

It is just adding in that variable that corresponds to the exact number that were missing in between, and then flagging it as dying or not dying at that final time period where there is data. 

 

Any suggestions would be much appreciated!

 


 

RAGC
Obsidian | Level 7

Hi @Reeza

 

I have included the tables below (So sorry for not doing so the first time!!), and made the new variable names I am looking to create in blue. When I have previously done this there was no missing values so each time point (i.e. Death1 to Deathn) corresponded to 1 time point each (i.e. an interval of 1) so the array statement would create the timeptn variable from the i variable in the do statement, the new death variable would be created for each person, and there was no need to create this interval variable to account for the greater time periods. 

 

The interval variable refers to how many sequential time points (called Death1 to Deathn) that data point corresponds to. For example, ID 1 does not have an observation at Death1 (i.e. time point 1 which corresponds to 1 interval). However, they do have a value at Death2 (i.e. time point 2, or the next time interval). As such, I want to create this interval variable to let me know that this first time point (that corresponds to this row for that patient) actually represents two possible time points (i.e. 2 intervals of time) -  Death1 and Death2.  Hopefully that clears things up, and thank you so much for taking a look!

 

Table 1: Wide data without flags

ID

Duration

Event

Death1

Death2

Death3

Death4

Death5

Death6

1

3

1

.

0

1

 

 

 

2

5

0

0

0

.

.

0

 

3

3

0

.

.

0

 

 

 

4

1

1

1

 

 

 

 

 

5

6

0

.

.

.

.

.

0

 

 

Table 2: Long Data Transformed Correctly

ID

Duration

Event

Timeptn

Death1

Death2

Death3

Death4

Death5

Death6

Interval_Length

Death

1

3

1

1

.

0

1

 

 

 

2

0

1

3

1

2

.

0

1

 

 

 

1

1

2

5

0

1

0

0

.

.

0

 

1

0

2

5

0

2

0

0

.

.

0

 

1

0

2

5

0

3

0

0

.

.

0

 

3

0

3

3

0

1

.

.

0

 

 

 

3

0

 

mkeintz
PROC Star

For each incoming record you want to output as many records as there are 0's or 1's in the DEATH1-DEATH6 variables.  So if there are three such values, you want to output a record for timeptn=1, timeptn=2, and timeptn=3.

 

And for each output record the interval_length is just the count of columns between the current death=0/1 and the preceding death=0/1.  Or for timeptn=1 it's just the position of the 0/1 value in the list of death variables.  The sum of interval_length values should equal duration:

 

data have;
  infile datalines missover;
  input ID Duration Event Death1 Death2 Death3 Death4 Death5 Death6;
datalines;
1 3 1 . 0 1  
2 5 0 0 0 . . 0   
3 3 0 . . 0       
4 1 1 1           
5 6 0 . . . . . 0
run;
 
data want (drop=t);
  set have;
  array dth {6} death1-death6;
  timeptn=1;
  do t=1 to duration;
    if dth{t}=0 or dth{t}=1 then do;
      interval_length=ifn(timeptn=1,t,dif(t));
      death=dth{t};
      output;
      timeptn=timeptn+1;
    end;
  end;
run;

 

The interesting part is within the "if dth{t}=0 or dth{t}=1 then do" group.

 

  1. It calculates interval_length as either the value for t (if timeptn=1) or otherwise the difference between the current t and the most recent t for a death=0 or 1.  The DIF function is defined as   dif(x)=x-lag(x).
  2. The death dummy assignment is obvious.
  3. Then the record is output and timeptn is incremented
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
RAGC
Obsidian | Level 7

@mkeintz thank you so much for your help!! It works perfectly 🙂 Your explanation was also really helpful in following along the code. Thank you!

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1795 views
  • 2 likes
  • 3 in conversation