Hi,
tsmk is my time-dependent covariate, deathstatus_inhance is censor, and stime_inhance is time. The "Change step" worked well, but the "Count step" only worked for four observations and stopped.
Please help me to correct my code.
DATA analysis.change;
SET analysis.smk_stime2;
ARRAY tsmk_(*) tsmk_1-tsmk_5; *call in the time-varying smoking variables;
ARRAY chng(5); *the new indicator variables;
t=1; *initialize the position variable for the indicator variables;
DO i = 2 TO 5;
IF tsmk_(i) NE tsmk_(i-1) THEN DO; *detects whether there is a change in smoking status;
chng(t) = i-1; *assigns the last year the status remained constant;
t=t+1;
END;
END;
RUN;
DATA analysis.count;
SET analysis.change;
ARRAY tsmk_(*) tsmk_1-tsmk_5; /* call in the time-varying smoking variables */
ARRAY chng(*) chng1-chng5; /* call in the indicator variables */
start = 0; /* initialize the beginning time for the study */
censor2 = 0; /* initialize the new censor variable */
t = 1; /* initialize the position variable for the indicator variables (chng1-chng5) */
DO i=1 TO stime_inhance; /* makes sure we only output the records that smoking status remains constant */
IF (chng(t) > . and chng(t) < stime_inhance) or i = stime_inhance THEN do;
/* assign the value of smoking status */
IF chng(t) > . THEN smoking_status = tsmk_(chng(t));
ELSE smoking_status = tsmk_(stime_inhance); /* assign the end time */
stop = min(chng(t), stime_inhance); /* assign the value of the censor variable */
IF i = stime_inhance THEN censor2 = deathstatus_inhance; /* assign the new start time */
IF t > 1 THEN start = chng(t-1); /* move the position variable */
t = t + 1;
OUTPUT; /* output the record to the new dataset */
end;
END;
RUN;
Please provide sample data for analysis.smk_stime2
In addition to providing the data as requested by @SASJedi , please show us the ENTIRE log for data step that creates analysis.count. Please copy the log as text and then paste it into the window that appears when you click on the </> icon.
Here is my data frame:
idnum deathstatus_inhance stime_inhance tsmk_1 tsmk_2 tsmk_3 tsmk_4 tsmk_5 1580 0 85.8809 1 1 1 0 0 1581 1 38.7023 0 1 1 0 0 1582 1 1.347 1 1 1 1 1 1585 0 85.7166 0 0 0 0 0 1586 0 85.7166 1 1 1 1 1 1587 1 13.0103 1 1 1 1 1 1588 1 16.6571 0 1 1 1 1 1589 1 2.037 0 0 0 0 0 1596 1 0.9199 1 1 1 1 1 1601 0 85.4209 0 0 0 0 0 1603 0 85.3881 0 0 0 0 0 1604 1 44.8789 1 1 1 1 1 1608 0 85.0267 0 1 0 0 0 1612 0 84.9281 1 1 0 0 0 1613 0 0.1314 0 0 0 0 0 1614 1 25.3306 1 1 1 1 1 1616 0 84.7967 0 0 0 0 0
*Counting Process;
DATA analysis.change;
SET analysis.smk_stime2;
ARRAY tsmk_(*) tsmk_1-tsmk_5; *call in the time-varying smoking variables;
ARRAY chng(5); *the new indicator variables;
t=1; *initialize the position variable for the indicator variables;
DO i = 2 TO 5;
IF tsmk_(i) NE tsmk_(i-1) THEN DO; *detects whether there is a change in smoking status;
chng(t) = i-1; *assigns the last year the status remained constant;
t=t+1;
END;
END;
RUN;
After running my code, it turned to:
idnum deathstatus_inhance stime_inhance tsmk_1 tsmk_2 tsmk_3 tsmk_4 tsmk_5 chng1 chng2 chng3 chng4 chng5 t i 1580 0 85.8809 1 1 1 0 0 3 . . . . 2 6 1581 1 38.7023 0 1 1 0 0 1 3 . . . 3 6 1582 1 1.347 1 1 1 1 1 . . . . . 1 6 1585 0 85.7166 0 0 0 0 0 . . . . . 1 6 1586 0 85.7166 1 1 1 1 1 . . . . . 1 6 1587 1 13.0103 1 1 1 1 1 . . . . . 1 6 1588 1 16.6571 0 1 1 1 1 1 . . . . 2 6 1589 1 2.037 0 0 0 0 0 . . . . . 1 6 1596 1 0.9199 1 1 1 1 1 . . . . . 1 6 1601 0 85.4209 0 0 0 0 0 . . . . . 1 6 1603 0 85.3881 0 0 0 0 0 . . . . . 1 6 1604 1 44.8789 1 1 1 1 1 . . . . . 1 6 1608 0 85.0267 0 1 0 0 0 1 2 . . . 3 6 1612 0 84.9281 1 1 0 0 0 2 . . . . 2 6 1613 0 0.1314 0 0 0 0 0 . . . . . 1 6 1614 1 25.3306 1 1 1 1 1 . . . . . 1 6 1616 0 84.7967 0 0 0 0 0 . . . . . 1 6
Then I tried to run the below code, which was unsuccessful:
DATA analysis.count;
SET analysis.change;
ARRAY tsmk_(*) tsmk_1-tsmk_5; /* call in the time-varying smoking variables */
ARRAY chng(*) chng1-chng5; /* call in the indicator variables */
start = 0; /* initialize the beginning time for the study */
censor2 = 0; /* initialize the new censor variable */
t = 1; /* initialize the position variable for the indicator variables (chng1-chng5) */
DO i=1 TO stime_inhance; /* makes sure we only output the records that smoking status remains constant */
IF (chng(t) > . and chng(t) < stime_inhance) or i = stime_inhance THEN do;
/* assign the value of smoking status */
IF chng(t) > . THEN smoking_status = tsmk_(chng(t));
ELSE smoking_status = tsmk_(stime_inhance); /* assign the end time */
stop = min(chng(t), stime_inhance); /* assign the value of the censor variable */
IF i = stime_inhance THEN censor2 = deathstatus_inhance; /* assign the new start time */
IF t > 1 THEN start = chng(t-1); /* move the position variable */
t = t + 1;
OUTPUT; /* output the record to the new dataset */
end;
END;
RUN;
Another question: having more than one record for each individual, can I use the "PROGRAMMING STATEMENT" approach?
proc phreg data= analysis.smk_stime;
model stime_inhance*deathstatus_inhance(0)= smoking /ties=erfon rl;
array tsmk_(*) tsmk_1-tsmk_5;
smoking= tsmk_[stime_inhance];
format tsmk_1-tsmk_5 tsmk.;
run;
Thanks
Are there errors in the log? Are there warnings in the log?
Repeating:
please show us the ENTIRE log for data step that creates analysis.count. Please copy the log as text and then paste it into the window that appears when you click on the </> icon.
Here is the log:
50774 50775 50776 DATA analysis.count; 50777 SET analysis.change; 50778 ARRAY tsmk_(*) tsmk_1-tsmk_5; /* call in the time-varying smoking variables */ 50779 ARRAY chng(*) chng1-chng5; /* call in the indicator variables */ 50780 start = 0; /* initialize the beginning time for the study */ 50781 censor2 = 0; /* initialize the new censor variable */ 50782 t = 1; /* initialize the position variable for the indicator 50782! variables (chng1-chng5) */ 50783 50784 DO i=1 TO stime_inhance; /* makes sure we only output the records that smoking 50784! status remains constant */ 50785 IF (chng(t) > . and chng(t) < stime_inhance) or i = stime_inhance THEN do; 50786 /* assign the value of smoking status */ 50787 IF chng(t) > . THEN smoking_status = tsmk_(chng(t)); 50788 ELSE smoking_status = tsmk_(stime_inhance); /* assign the end time */ 50789 stop = min(chng(t), stime_inhance); /* assign the value of the censor 50789! variable */ 50790 IF i = stime_inhance THEN censor2 = deathstatus_inhance; /* assign the new 50790! start time */ 50791 IF t > 1 THEN start = chng(t-1); /* move the position variable */ 50792 t = t + 1; 50793 OUTPUT; /* output the record to the new dataset */ 50794 end; 50795 END; 50796 RUN; ERROR: Array subscript out of range at line 50788 column 35. idnum=2289 deathstatus_inhance=0 stime_inhance=48 tsmk_1=0 tsmk_2=0 tsmk_3=0 tsmk_4=0 tsmk_5=0 chng1=. chng2=. chng3=. chng4=. chng5=. t=1 i=48 start=0 censor2=0 smoking_status=. stop=. _ERROR_=1 _N_=293 NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 293 observations read from the data set ANALYSIS.CHANGE. WARNING: The data set ANALYSIS.COUNT may be incomplete. When this step was stopped there were 159 observations and 19 variables. WARNING: Data set ANALYSIS.COUNT was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.04 seconds cpu time 0.01 seconds
ERROR: Array subscript out of range at line 50788 column 35. idnum=2289 deathstatus_inhance=0 stime_inhance=48 tsmk_1=0 tsmk_2=0 tsmk_3=0 tsmk_4=0 tsmk_5=0 chng1=. chng2=. chng3=. chng4=. chng5=. t=1 i=48 start=0 censor2=0 smoking_status=. stop=. _ERROR_=1 _N_=293
So line 50788 is:
ELSE smoking_status = tsmk_(stime_inhance); /* assign the end time */
The array subscript is the value of the variable STIME_INHANCE, which for this row of the data has the value 48. How do I know that? Because it is printed in the log (I have highlighted it in red). The array allows subscripts up to 5, it has five elements, that's how you defined it, so array element 48 doesn't exist and trying to use array element 48 causes an error.
I have not attempted to figure out what you are trying to do with this data step, so I cannot suggest an improvement. It is always helpful to explain what you are trying to do in sufficient detail that we can help write the code without errors.
Please, in the future, when you get errors in the log, you need to show us the full log for the step with the errors in your first post on the subject. We can't diagnose the problem without the log.
Hi @em1535,
As far as I see, your DATA step creating dataset ANALYSIS.COUNT works perfectly on survival data with a discrete time scale like year 1, 2, 3, 4, 5 with corresponding variables TSMK_1, ..., TSMK_5 describing the smoking status in each of those time periods. However, you apply this program to data with time measured on a continuous scale with values ranging at least from 0.1314 to 85.8809.
So, the failed attempt of retrieving the smoking status at time 48 from the five-element array TSMK_ raises the question: What are the five time intervals on your continuous scale that the variables TSMK_1, ..., TSMK_5 correspond to?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.