turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- REML estimator of variance components in frailty m...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-19-2015 08:10 PM

Hello everybody.

I have some survival data with multilevel correlated structure (simulated based on multi-level frailty model with weibull baseline hazard and normal random effect). So I am sure that using the same model to analyse it, I should get satisfactory results. So to use multi-level frailty modeling I use PROC NLMIXED. But I do not get precise result. There is a huge bias in my estimations. I asked this question from Dr. Rick Wicklin and he pointed out that PROC NLMIXED used ML estimator of parameters which are not always unbiased. I think he is absolutely right. Indeed, ML estimator of sigma^2, the variance components, is biased and we should use REML estimator for variance components not ML estimator.

However I'm getting biased results for other parameters as well as sigma^2. I think it's because the numerical methods in PROC NLMIXED is Newton-Rophson; so in each iteration error in estimating sigma^2 affects other parameters's estimates and I get biased results for all parameters in PROC NLMIXED. Now I have a question. How can I use frailty models to analyse multi-level survival data such that it gives REML estimation of the variance components and ML estimate of other parameters?

Any help is so much appreciated.

Accepted Solutions

Solution

06-23-2015
07:02 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-23-2015 07:02 AM

I don't know the answer to your question, but most SAS procedures enable you to choose the optimization method GLOBALLY. You seem to want to use REML estimates for some parameters but ML estimates for others. I suspect that most SAS procedures do not provide an option for that level of control.

Perhaps the more relevant question is why does it matter? ML estimates are used thousands of times a day and the fact that some are biased is just the price we pay for such a versatile and powerful technique. If this is for a simulation study, I'd just present the results and say "It is known that the XYZ estimator is biased, which is why the Monte Carlo estimate is systematically different from the population parameter."

All Replies

Solution

06-23-2015
07:02 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-23-2015 07:02 AM

I don't know the answer to your question, but most SAS procedures enable you to choose the optimization method GLOBALLY. You seem to want to use REML estimates for some parameters but ML estimates for others. I suspect that most SAS procedures do not provide an option for that level of control.

Perhaps the more relevant question is why does it matter? ML estimates are used thousands of times a day and the fact that some are biased is just the price we pay for such a versatile and powerful technique. If this is for a simulation study, I'd just present the results and say "It is known that the XYZ estimator is biased, which is why the Monte Carlo estimate is systematically different from the population parameter."

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-23-2015 11:33 AM

To go on from Rick's statement, ML estimates will minimize the mean squared error of the estimate, which is the sum of the variance and the squared bias. So MLE's are more efficient than the unbiased estimators in most instances. REML estimates are not necessarily unbiased, as they may be subject to boundary conditions--that is why REML estimation in MIXED or GLIMMIX will occasionally return the error relating to minimum variance estimates cannot be computed when starting from default initial parameter estimates.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-23-2015 10:12 PM

Thanks to both of you. It was really helpful confirming that there is no way to get REML estimate of variance components in parametric frailty modeling.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-24-2015 08:20 AM

But there is. It is just that you cannot get ML estimates for some of the parameters and REML for the rest. Either you use maximum likelihood or restricted maximum likelihood as the optimization algorithm. One other thing about REML estimates is that they can be shown to be equivalent to Bayesian estimators derived from appropriate (igamma, if I remember correctly) priors, and thus, if there is missing data (either missing at random or missing completely at random), all estimates are valid reflections of the data. That beats using ML and having to impute and model average, at least for convenience.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-24-2015 08:55 AM

My response is time-to-event and my model is multi-level frailty with Weibull baseline survival time and normal random effects. PROC MIXED and GLIMMIX don't analyze such kind of data. I can only use PROC NLMIXED which gives MLE of all the parameters. If I was using PROC MIXED or GLIMMIX I could add the option: /method = REML in the random statement to get REML estimates of the variance components. But there is no such option in PROC NLMIXED. If I am wrong, I appreciate it if you can correct me.

I think this problem exists only for parametric approach. For Semiparametric approach, we can use PROC PHREG and the option :/method = REML in the random statement to get REML estimates of variance components. Don't you know any procedure that does the same for parametric approach?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 09:34 AM

I think you have covered the NLMIXED distinction from GLIMMIX/MIXED quite well. In the case of PHREG, semiparametric is a good thing in that there is a parametric form for the effects of the explanatory variables, which are the ones you are after, so far as I understand. It allows, but does not require, an unspecified form for the underlying survivor function.

In PHREG, REML is the default for estimation of the random effects under the default lognormal distribution. A gamma frailty model shifts everything to ML. I would go with PHREG as the method of choice.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 11:17 AM

Thanks Steve. Exactly. The best choice is to use PHREG. But my problem is my graduate committee! I have generated survival data from Weibul frailty model with normal random effects. When I use NLMIXED to analyse this data, I get biased results; when I use PHREG I get unbiased results. So someone would ask why the model from which you are sampling does not give unbiased results...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 12:44 PM

If you are getting *huge* biases in your parameter estimates with NLMIXED, you probably have errors in your code. ML does give biased estimates, but for your kind of problem the bias should be small.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 01:40 PM

I agree with , and since this is a simulation study, it should be easy to test whether your code is correct. As your sample size increases, the bias should get smaller. Construct a plot of Mean Bias vs Sample Size and see if the Mean Bias approaches zero. If I were on your comittee, that's the graph that I would want to see.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 01:53 PM

Thanks everybody for your helpful comments.

I get very precise estimates for regression slopes parameters. But very biased estimates for b0, rho and logsig. How come my codes are wrong but I'm getting perfect results for regression parameters? On the other hand, you are absolutely right in saying that for large sample sizes bias shrinks!

Actually it's a long time I've been looking for an error in my codes and I find nothing. Would you take a look at my codes and see if you see any problem there?

proc nlmixed data = Survival_MultiLevel qpoints = 30 noad;

by sample_id;

bounds rho > 0 ;

parms rho = 2 b1 = 0 b2 = 0 logsig = .5 b0 = 0;

eta = b0 + b1 * covariate1 + b2 * covariate2 + W;

lambda = exp(eta);

log_S_t = - lambda * (survival_t ** rho);

log_h = (censor_flag = 1) * (eta + log(rho) + (rho - 1) * log(survival_t));

ll = log_h + log_S_t;

model survival_t ~ general(ll);

random W ~ normal(0, exp(2 * logsig)) subject = cluster_id ;

ods select off;

ods output ParameterEstimates = Multilevel_shared_par;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 02:18 PM

No guarantees that any of these are helpful or even that you haven't thought about.

My first concern is with the formula for eta. This looks like the linear part of what is going on, and you say that you are getting good values for b1 and b2, but not b0. My concern here is the term W. It is purely additive and will be confounded with b0 and may be the entire source of 'bias' in b0. Is it necessary, and if not, what is the behavior if it is deleted from the equation?.

Second, what is the range for survival_t? Because it is raised to a power greater than 1, it may be dominating everything else, such that all of the fitting of variability is trying to fit the value of rho (yet another target for bias questions). Could this be a problem?

Third, what about the other part of the log survival function? With W floating around in the eta estimate, and lambda being the exponentiated value, there are several chances that this product gets inflated due to representation error. Is there an alternative parameterization that avoids these problems, at least until the final optimization?

In summary, I think the bias problem may be more related to inefficient code--not WRONG--just inefficient.

Have you searched the SAS listserve archives? Check the SAS-L archives for posts on NLMIXED by David Cassell, Dale McLerran and several others. They may be from four or five years ago, but the principles remain intact.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 05:03 PM

Actually, I don't think the b0 + .... + W is a problem (since W has an expected value of 0). This is basically a random intercept mixed model. I am not an expert at survival/lifetime analysis, so I will have to study your code later. The following articles might help you.

http://support.sas.com/resources/papers/proceedings12/216-2012.pdf

http://support.sas.com/resources/papers/proceedings09/237-2009.pdf

Maybe not quite what you want, but should be helpful -- lots of PHREG and NLMIXED code.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-26-2015 10:20 PM

Thanks Steve.

Thank you "lvm" so much for these sources. Actually my codes are exactly the same as codes in these files... I don't know what to do now...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-28-2015 06:12 AM

Research is hard and can be frustrating. I guarantee you that Steve, lvm, and I have all been in your shoes. What I do is pull back and simplify to a problem that I understand and make sure that the program works as expected. If so, I increase the complexity a little. If not, I have an easier problem to troubleshoot.

In your case, you might want to eliminate the covariates and look at the intercept-only model.