BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
RyanSimmons
Pyrite | Level 9

I am running some accelerated failure time models using PROC LIFEREG. As part of this, I am using model fit statistics to decide which distribution is appropriate for my data. Specifically, I am looking at the Exponential, Weibull, and Generalized Gamma distributions.

 

However, I have noticed that my choice of model changes depending on whether or not the NOLOG option is specified in the MODEL statement. That is, if I run code like the following:

 

PROC LIFEREG data=example;
     model time*event(0) = x|y / dist=exponential;
     ods select FitStatistics;
run;

PROC LIFEREG data=example;
     model time*event(0) = x|y / dist=weibull;
     ods select FitStatistics;
run;

PROC LIFEREG data=example;
     model time*event(0) = x|y / dist=gamma;
     ods select FitStatistics;
run;

Then I find that the Weibull model fits the data best (lowest AIC, AICC, BIC). However, if I add the NOLOG option, and run the code as follows:

 

 

PROC LIFEREG data=example;
     model time*event(0) = x|y / dist=exponential nolog;
     ods select FitStatistics;
run;

PROC LIFEREG data=example;
     model time*event(0) = x|y / dist=weibull nolog;
     ods select FitStatistics;
run;

PROC LIFEREG data=example;
     model time*event(0) = x|y / dist=gamma nolog;
     ods select FitStatistics;
run;

Then I find that the Gamma distribution fits the data best by the same criteria. And let me note that this isn't a case where the AICs are all within ~5 of each other one way or the other, the differences are large (on the unlogged scale, Gamma AIC is almost 60 less than Weibull AIC, while on the log scale Gamma AIC is about 20 higher than Weibull AIC).

 

 

Similarly, instead of relying on AIC, etc., I can perform likelihood ratio tests, since an exponential AFT model can be viewed as being nested within a Weibull AFT model, and a Weibull AFT model can be viewed as being nested within a Generalized Gamma AFT model (e.g. these course notes, p.118). As with the above, my interpretation changes depending on whether or not I am using the likelihood on the logged or unlogged responses (same pattern: on the log scale, the numbers tell me to choose the Weibull, while on the unlogged scale the numbers tell me to choose the Generalized Gamma).

 

Unless I am missing something fundamental, I don't understand how these can give me radically different results. A log is a one-to-one transformation, so I don't see how this would have such dramatic impact on the RELATIVE likelihoods/AICs of the three models (that is, I understand it will give me different absolute fit statistic values, but I don't understand why it is changing the nature of the relationship between these fit statistics). Further, per the SAS documentation: "When comparing models, you should compare fit criteria based on the log likelihood that is computed by using the response on the same scale, either always based on the log of the response or always based on the response on the original scale." This seems to imply that it doesn't matter which one you use so long as you are consistent across all models in the comparison; but in my case it does matter, and quite markedly so.

 

Can anybody help explain why this might be the case? Or which scale I should use for appropriately picking a distribution? The SAS documentation recommends using NOLOG specifically when comparing distributions like the Weibull and the Normal, which compute the likeliood on different scales by default, but otherwise offer no clues as to how to use the NOLOG option in the context of similar distributions like the Weibull and Exponential.

1 ACCEPTED SOLUTION

Accepted Solutions
sbxkoenk
SAS Super FREQ

Hello,

 

If you specify NOLOG you are in fact investigating the appropriateness of a DIFFERENT distribution.

LNORMAL NOLOG is no longer referring to a LogNormal distribution but to a Normal distribution.

5 distributions WITH and WITHOUT NOLOG generate ten(!) different parametric models.

 

See for example this NESUG 18 poster/paper

Predictive Modeling Using Survival Analysis

Vadim Pliner, Verizon Wireless, Orangeburg, NY

 

On page 2 we read ...

 

PARAMETRIC REGRESSION MODELS

In survival analysis, the parametric regression models have this form:

Y = β0 + Σ βjxj + σε,

where Y is either T (survival/failure time) or log(T), xj are covariates, ε is a random error,

and βj and σ are parameters to be estimated. In SAS, the maximum likelihood estimators

of the parameters can be calculated using PROC LIFEREG if one of the following classes

of survival distribution functions of T is specified (option dist= or d= on the MODEL

statement): exponential (d=EXPONENTIAL), Weibull (d=WEIBULL), log-logistic

(d=LLOGISTIC), log-normal (d=LNORMAL), generalized gamma (d=GAMMA),

logistic (d=LOGISTIC), and normal (d=NORMAL). By default, PROC LIFEREG

models Y=log(T) when the first five models are specified, which leads to so called

accelerated failure time models. One can suppress the log transformation with the

NOLOG option. When the Exponential or Weibull options are specified, adding NOLOG

results in the extreme value distribution with one and two parameters, respectively.

d=gamma in combination with the NOLOG option means the log-gamma distribution of

T. Specifying d=LNORMAL NOLOG is equivalent to just d=NORMAL (without

NOLOG). Similarly, d=LLOGISTIC NOLOG leads to the same model as d=LOGISTIC

(without NOLOG). And NOLOG has no effect on either d=NORMAL or d=LOGISTIC.

 

Overall, all combinations of values of the two options (d= with or without NOLOG)

generate ten different parametric models. To select the best one, two approaches are

described below. They are both based on the value of maximized log likelihood, which is

computed by PROC LIFEREG.

 

 

Kind regards,

Koen

 

 

 

View solution in original post

2 REPLIES 2
sbxkoenk
SAS Super FREQ

Hello,

 

If you specify NOLOG you are in fact investigating the appropriateness of a DIFFERENT distribution.

LNORMAL NOLOG is no longer referring to a LogNormal distribution but to a Normal distribution.

5 distributions WITH and WITHOUT NOLOG generate ten(!) different parametric models.

 

See for example this NESUG 18 poster/paper

Predictive Modeling Using Survival Analysis

Vadim Pliner, Verizon Wireless, Orangeburg, NY

 

On page 2 we read ...

 

PARAMETRIC REGRESSION MODELS

In survival analysis, the parametric regression models have this form:

Y = β0 + Σ βjxj + σε,

where Y is either T (survival/failure time) or log(T), xj are covariates, ε is a random error,

and βj and σ are parameters to be estimated. In SAS, the maximum likelihood estimators

of the parameters can be calculated using PROC LIFEREG if one of the following classes

of survival distribution functions of T is specified (option dist= or d= on the MODEL

statement): exponential (d=EXPONENTIAL), Weibull (d=WEIBULL), log-logistic

(d=LLOGISTIC), log-normal (d=LNORMAL), generalized gamma (d=GAMMA),

logistic (d=LOGISTIC), and normal (d=NORMAL). By default, PROC LIFEREG

models Y=log(T) when the first five models are specified, which leads to so called

accelerated failure time models. One can suppress the log transformation with the

NOLOG option. When the Exponential or Weibull options are specified, adding NOLOG

results in the extreme value distribution with one and two parameters, respectively.

d=gamma in combination with the NOLOG option means the log-gamma distribution of

T. Specifying d=LNORMAL NOLOG is equivalent to just d=NORMAL (without

NOLOG). Similarly, d=LLOGISTIC NOLOG leads to the same model as d=LOGISTIC

(without NOLOG). And NOLOG has no effect on either d=NORMAL or d=LOGISTIC.

 

Overall, all combinations of values of the two options (d= with or without NOLOG)

generate ten different parametric models. To select the best one, two approaches are

described below. They are both based on the value of maximized log likelihood, which is

computed by PROC LIFEREG.

 

 

Kind regards,

Koen

 

 

 

sbxkoenk
SAS Super FREQ

Hello Ryan,

 

Thanks for marking my answer as the solution.

To be complete, I just add the URL of the paper I was citing:

 

NESUG 18 (North East SAS Users Group)

Predictive Modeling Using Survival Analysis

Vadim Pliner, Verizon Wireless, Orangeburg, NY

http://www.lexjansen.com/nesug/nesug05/pos/pos6.pdf

 

 

Cheers,

Koen

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3433 views
  • 0 likes
  • 2 in conversation