BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mehdi_R
Fluorite | Level 6

Hello everybody.

I have some survival data with multilevel correlated structure (simulated based on multi-level frailty model with weibull baseline hazard and normal random effect). So I am sure that using the same model to analyse it, I should get satisfactory results. So to use multi-level frailty modeling I use PROC NLMIXED. But I do not get precise result. There is a huge bias in my estimations. I asked this question from Dr. Rick Wicklin and he pointed out that PROC NLMIXED used ML estimator of parameters which are not always unbiased. I think he is absolutely right. Indeed, ML estimator of sigma^2, the variance components, is biased and we should use REML estimator for variance components not ML estimator.

However I'm getting biased results for other parameters as well as sigma^2. I think it's because the numerical methods in PROC NLMIXED is Newton-Rophson; so in each iteration error in estimating sigma^2 affects other parameters's estimates and I get biased results for all parameters in PROC NLMIXED. Now I have a question. How can I use frailty models to analyse multi-level survival data such that it gives REML estimation of the variance components and ML estimate of other parameters?

Any help is so much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

I don't know the answer to your question, but most SAS procedures enable you to choose the optimization method GLOBALLY.  You seem to want to use REML estimates for some parameters but ML estimates for others. I suspect that most SAS procedures do not provide an option for that level of control.

Perhaps the more relevant question is why does it matter? ML estimates are used thousands of times a day and the fact that some are biased is just the price we pay for such a versatile and powerful technique.  If this is for a simulation study, I'd just present the results and say "It is known that the XYZ estimator is biased, which is why the Monte Carlo estimate is systematically different from the population parameter."

View solution in original post

17 REPLIES 17
Rick_SAS
SAS Super FREQ

I don't know the answer to your question, but most SAS procedures enable you to choose the optimization method GLOBALLY.  You seem to want to use REML estimates for some parameters but ML estimates for others. I suspect that most SAS procedures do not provide an option for that level of control.

Perhaps the more relevant question is why does it matter? ML estimates are used thousands of times a day and the fact that some are biased is just the price we pay for such a versatile and powerful technique.  If this is for a simulation study, I'd just present the results and say "It is known that the XYZ estimator is biased, which is why the Monte Carlo estimate is systematically different from the population parameter."

SteveDenham
Jade | Level 19

To go on from Rick's statement, ML estimates will minimize the mean squared error of the estimate, which is the sum of the variance and the squared bias.  So MLE's are more efficient than the unbiased estimators in most instances. REML estimates are not necessarily unbiased, as they may be subject to boundary conditions--that is why REML estimation in MIXED or GLIMMIX will occasionally return the error relating to minimum variance estimates cannot be computed when starting from default initial parameter estimates.

Steve Denham

Mehdi_R
Fluorite | Level 6

Thanks to both of you. It was really helpful confirming that there is no way to get REML estimate of variance components in parametric frailty modeling.

SteveDenham
Jade | Level 19

But there is.  It is just that you cannot get ML estimates for some of the parameters and REML for the rest.  Either you use maximum likelihood or restricted maximum likelihood as the optimization algorithm.  One other thing about REML estimates is that they can be shown to be equivalent to Bayesian estimators derived from appropriate (igamma, if I remember correctly) priors, and thus, if there is missing data (either missing at random or missing completely at random), all estimates are valid reflections of the data.  That beats using ML and having to impute and model average, at least for convenience.

Steve Denham

Mehdi_R
Fluorite | Level 6

My response is time-to-event and my model is multi-level frailty with Weibull baseline survival time and normal random effects. PROC MIXED and GLIMMIX don't analyze such kind of data. I can only use PROC NLMIXED which gives MLE of all the parameters. If I was using PROC MIXED or GLIMMIX I could add the option: /method = REML in the random statement to get REML estimates of the variance components. But there is no such option in PROC NLMIXED. If I am wrong, I appreciate it if you can correct me.

I think this problem exists only for parametric approach. For Semiparametric approach, we can use PROC PHREG and the option :/method = REML in the random statement to get REML estimates of variance components. Don't you know any procedure that does the same for parametric approach?

SteveDenham
Jade | Level 19

I think you have covered the NLMIXED distinction from GLIMMIX/MIXED quite well.  In the case of PHREG, semiparametric is a good thing in that there is a parametric form for the effects of the explanatory variables, which are the ones you are after, so far as I understand.  It allows, but does not require, an unspecified form for the underlying survivor function.

In PHREG, REML is the default for estimation of the random effects under the default lognormal distribution.  A gamma frailty model shifts everything to ML.  I would go with PHREG as the method of choice.

Steve Denham

Mehdi_R
Fluorite | Level 6

Thanks Steve. Exactly. The best choice is to use PHREG. But my problem is my graduate committee! I have generated survival data from Weibul frailty model with normal random effects. When I use NLMIXED to analyse this data, I get biased results; when I use PHREG I get unbiased results. So someone would ask why the model from which you are sampling does not give unbiased results...

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

If you are getting huge biases in your parameter estimates with NLMIXED, you probably have errors in your code. ML does give biased estimates, but for your kind of problem the bias should be small.

Rick_SAS
SAS Super FREQ

I agree with , and since this is a simulation study, it should be easy to test whether your code is correct.  As your sample size increases, the bias should get smaller. Construct a plot of Mean Bias vs Sample Size and see if the Mean Bias approaches zero.  If I were on your comittee, that's the graph that I would want to see. 

Mehdi_R
Fluorite | Level 6

Thanks everybody for your helpful comments.

I get very precise estimates for regression slopes parameters. But very biased estimates for b0, rho and logsig. How come my codes are wrong but I'm getting perfect results for regression parameters? On the other hand, you are absolutely right in saying that for large sample sizes bias shrinks!

Actually it's a long time I've been looking for an error in my codes and I find nothing. Would you take a look at my codes and see if you see any problem there?

proc nlmixed data = Survival_MultiLevel qpoints = 30 noad;

  by sample_id;

  bounds rho > 0 ;

  parms rho = 2 b1 = 0 b2 = 0 logsig = .5 b0 = 0;

  eta = b0 + b1 * covariate1 + b2 * covariate2 + W;

  lambda = exp(eta);

  log_S_t = - lambda * (survival_t ** rho);

  log_h = (censor_flag = 1) * (eta + log(rho) + (rho - 1) * log(survival_t));

  ll = log_h + log_S_t;

  model survival_t ~ general(ll);

  random W ~ normal(0, exp(2 * logsig)) subject = cluster_id ;

  ods select off;

  ods output ParameterEstimates = Multilevel_shared_par;

  run;

SteveDenham
Jade | Level 19

No guarantees that any of these are helpful or even that you haven't thought about.

My first concern is with the formula for eta.  This looks like the linear part of what is going on, and you say that you are getting good values for b1 and b2, but not b0.  My concern here is the term W.  It is purely additive and will be confounded with b0 and may be the entire source of 'bias' in b0.  Is it necessary, and if not, what is the behavior if it is deleted from the equation?.

Second, what is the range for survival_t?  Because it is raised to a power greater than 1, it may be dominating everything else, such that all of the fitting of variability is trying to fit the value of rho (yet another target for bias questions).  Could this be a problem?

Third, what about the other part of the log survival function?  With W floating around in the eta estimate, and lambda being the exponentiated value, there are several chances that this product gets inflated due to representation error.  Is there an alternative parameterization that avoids these problems, at least until the final optimization?

In summary, I think the bias problem may be more related to inefficient code--not WRONG--just inefficient.

Have you searched the SAS listserve archives?   Check the SAS-L archives for posts on NLMIXED by David Cassell, Dale McLerran and several others.  They may be from four or five years ago, but the principles remain intact.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Actually, I don't think the b0 + .... + W is a problem (since W has an expected value of 0). This is basically a random intercept mixed model. I am not an expert at survival/lifetime analysis, so I will have to study your code later. The following articles might help you.

http://support.sas.com/resources/papers/proceedings12/216-2012.pdf

http://support.sas.com/resources/papers/proceedings09/237-2009.pdf

Maybe not quite what you want, but should be helpful -- lots of PHREG and NLMIXED code.

Mehdi_R
Fluorite | Level 6

Thanks Steve.

Thank you "lvm" so much for these sources. Actually my codes are exactly the same as codes in these files... I don't know what to do now...

Rick_SAS
SAS Super FREQ

Research is hard and can be frustrating.  I guarantee you that Steve, lvm, and I have all been in your shoes.  What I do is pull back and simplify to a problem that I understand and make sure that the program works as expected.  If so, I increase the complexity a little. If not, I have an easier problem to troubleshoot.

In your case, you might want to eliminate the covariates and look at the intercept-only model. 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 2443 views
  • 8 likes
  • 4 in conversation