Hello everyone,
I am working on a discrete-time survival analysis, and I have quite a bit of missing information on the outcome at various time points. To resolve this, I would like to create a variable that corresponds to the number of missing time points (i.e. the different intervals) to include as a covariate in the analysis. My apologies if my explanation is not that clear, but I have attached two tables that provide a visual example of what I mean. Table 1 is an example of how the dataset currently looks, and table 2 represents an example of how I would like it to look once it has been transformed appropriately.
I can get the basic array that would restructure the data from long to wide:
DATA long;
SET wide;
DO Time=1 TO duration;
IF Time=duration AND event=1 THEN death=1;
ELSE death=0;
OUTPUT;
END;
RUN;
It is just adding in that variable that corresponds to the exact number that were missing in between, and then flagging it as dying or not dying at that final time period where there is data.
Any suggestions would be much appreciated!
For each incoming record you want to output as many records as there are 0's or 1's in the DEATH1-DEATH6 variables. So if there are three such values, you want to output a record for timeptn=1, timeptn=2, and timeptn=3.
And for each output record the interval_length is just the count of columns between the current death=0/1 and the preceding death=0/1. Or for timeptn=1 it's just the position of the 0/1 value in the list of death variables. The sum of interval_length values should equal duration:
data have;
infile datalines missover;
input ID Duration Event Death1 Death2 Death3 Death4 Death5 Death6;
datalines;
1 3 1 . 0 1
2 5 0 0 0 . . 0
3 3 0 . . 0
4 1 1 1
5 6 0 . . . . . 0
run;
data want (drop=t);
set have;
array dth {6} death1-death6;
timeptn=1;
do t=1 to duration;
if dth{t}=0 or dth{t}=1 then do;
interval_length=ifn(timeptn=1,t,dif(t));
death=dth{t};
output;
timeptn=timeptn+1;
end;
end;
run;
The interesting part is within the "if dth{t}=0 or dth{t}=1 then do" group.
Is interval length what you want to calculate? If so, how is that being calculated. To me, for ID 1 they should be the same?
PS Please post tables and such directly into the forum, attachments are a pain to work with.
@RAGC wrote:
Hello everyone,
I am working on a discrete-time survival analysis, and I have quite a bit of missing information on the outcome at various time points. To resolve this, I would like to create a variable that corresponds to the number of missing time points (i.e. the different intervals) to include as a covariate in the analysis. My apologies if my explanation is not that clear, but I have attached two tables that provide a visual example of what I mean. Table 1 is an example of how the dataset currently looks, and table 2 represents an example of how I would like it to look once it has been transformed appropriately.
I can get the basic array that would restructure the data from long to wide:
DATA long;
SET wide;
DO Time=1 TO duration;
IF Time=duration AND event=1 THEN death=1;
ELSE death=0;
OUTPUT;
END;
RUN;
It is just adding in that variable that corresponds to the exact number that were missing in between, and then flagging it as dying or not dying at that final time period where there is data.
Any suggestions would be much appreciated!
Hi @Reeza,
I have included the tables below (So sorry for not doing so the first time!!), and made the new variable names I am looking to create in blue. When I have previously done this there was no missing values so each time point (i.e. Death1 to Deathn) corresponded to 1 time point each (i.e. an interval of 1) so the array statement would create the timeptn variable from the i variable in the do statement, the new death variable would be created for each person, and there was no need to create this interval variable to account for the greater time periods.
The interval variable refers to how many sequential time points (called Death1 to Deathn) that data point corresponds to. For example, ID 1 does not have an observation at Death1 (i.e. time point 1 which corresponds to 1 interval). However, they do have a value at Death2 (i.e. time point 2, or the next time interval). As such, I want to create this interval variable to let me know that this first time point (that corresponds to this row for that patient) actually represents two possible time points (i.e. 2 intervals of time) - Death1 and Death2. Hopefully that clears things up, and thank you so much for taking a look!
Table 1: Wide data without flags
ID | Duration | Event | Death1 | Death2 | Death3 | Death4 | Death5 | Death6 |
1 | 3 | 1 | . | 0 | 1 |
|
|
|
2 | 5 | 0 | 0 | 0 | . | . | 0 |
|
3 | 3 | 0 | . | . | 0 |
|
|
|
4 | 1 | 1 | 1 |
|
|
|
|
|
5 | 6 | 0 | . | . | . | . | . | 0 |
Table 2: Long Data Transformed Correctly
ID | Duration | Event | Timeptn | Death1 | Death2 | Death3 | Death4 | Death5 | Death6 | Interval_Length | Death |
1 | 3 | 1 | 1 | . | 0 | 1 |
|
|
| 2 | 0 |
1 | 3 | 1 | 2 | . | 0 | 1 |
|
|
| 1 | 1 |
2 | 5 | 0 | 1 | 0 | 0 | . | . | 0 |
| 1 | 0 |
2 | 5 | 0 | 2 | 0 | 0 | . | . | 0 |
| 1 | 0 |
2 | 5 | 0 | 3 | 0 | 0 | . | . | 0 |
| 3 | 0 |
3 | 3 | 0 | 1 | . | . | 0 |
|
|
| 3 | 0 |
For each incoming record you want to output as many records as there are 0's or 1's in the DEATH1-DEATH6 variables. So if there are three such values, you want to output a record for timeptn=1, timeptn=2, and timeptn=3.
And for each output record the interval_length is just the count of columns between the current death=0/1 and the preceding death=0/1. Or for timeptn=1 it's just the position of the 0/1 value in the list of death variables. The sum of interval_length values should equal duration:
data have;
infile datalines missover;
input ID Duration Event Death1 Death2 Death3 Death4 Death5 Death6;
datalines;
1 3 1 . 0 1
2 5 0 0 0 . . 0
3 3 0 . . 0
4 1 1 1
5 6 0 . . . . . 0
run;
data want (drop=t);
set have;
array dth {6} death1-death6;
timeptn=1;
do t=1 to duration;
if dth{t}=0 or dth{t}=1 then do;
interval_length=ifn(timeptn=1,t,dif(t));
death=dth{t};
output;
timeptn=timeptn+1;
end;
end;
run;
The interesting part is within the "if dth{t}=0 or dth{t}=1 then do" group.
@mkeintz thank you so much for your help!! It works perfectly 🙂 Your explanation was also really helpful in following along the code. Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.