Solved: Re: Getting CI from a GLMM

BlueNose · Posted 10-13-2015 01:36 AM

Hello all,

I ran this logistic model with random effects using a GLMM, and using the coefficients, I have calculated the predicted probabilities for every value of X, which is a continous variable (well, in my data I only had a few values of X, but I am extrapolating). After calculating the predicted probabilities, I have used sgplot to make a sigmoid plot of X in the x-axis and the probabilities in the y axis.

What I couldn't do, is:

Calculate CI's for the probabilities, instead of one sigmoid, I want to have 3, i.e., I want to add one for the lower CI limit and one for the higher CI limit.

Do you know how I can obtain the CI's for the probabilities I have just calculated ?

Thank you !

proc glimmix data=data;
   class Id;
   model Success (event='1') = X / solution dist=BERNOULLI link=logit;
   random int / sub=ID;
   output out=pred pred=p pred(ILINK )=pp pred(NOBLUP ILINK )=marmean STDERR(NOBLUP ILINK )=se LCL(NOBLUP ILINK )=lcl        UCL(NOBLUP ILINK )=Ucl;
run;
 
data prediction;
     do X = 10 to 50;
     mu = -5 + 0.15*X;
     phat = exp(mu)/(1+exp(mu));
     output; end;
run;

lvm · Posted 10-13-2015 03:54 PM

You want to use proc plm after running glimmix. Put the Xs in a data set first, run glimmix using the store= command, then run plm to do the scoring. You get predicted and confidence limits. You either print the results or plot them. If you don't use ilink option in plm, you get the logit predictions and confidence limits.

data scoreX;
     do X = 10 to 50;
     output; end;
run;
proc glimmix data=data ;
   class Id;
   model Success (event='1') = X / solution dist=BERNOULLI link=logit;
   random int / sub=ID;
   output out=pred pred=p pred(ILINK )=pp pred(NOBLUP ILINK )=marmean STDERR(NOBLUP ILINK )=se LCL(NOBLUP ILINK )=lcl        UCL(NOBLUP ILINK )=Ucl;
	store results;
run;
proc plm restore=results;
score data=scoreX out=predict predicted lclm uclm / ilink;
run;
proc print data=predict;run;
proc sgplot data=predict;
series x=x y=predicted / lineattrs=(color=red thickness=2);
series x=x y=lclm / lineattrs=(color=blue);
series x=x y=uclm / lineattrs=(color=blue);
run;

View solution in original post

SteveDenham · Posted 10-13-2015 09:00 AM

I think I am missing something here. Do you want what I would call the "expected" confidence bounds? That is, those around the probabilities generated in the datastep DATA PREDICTION. Or do you want the "resulting" confidence bounds, which should be in your output dataset PRED?

I suppose the easiest way to the first is to use the known standard error of a probability (=sqrt(p * (1-p) /n) and the CDF function for the normal distribution to calculate the approximate confidence bounds. If it is the second, then fitting a spline through the values (using SGPLOT) should work.

Steve Denham

lvm · Posted 10-13-2015 03:54 PM

You want to use proc plm after running glimmix. Put the Xs in a data set first, run glimmix using the store= command, then run plm to do the scoring. You get predicted and confidence limits. You either print the results or plot them. If you don't use ilink option in plm, you get the logit predictions and confidence limits.

data scoreX;
     do X = 10 to 50;
     output; end;
run;
proc glimmix data=data ;
   class Id;
   model Success (event='1') = X / solution dist=BERNOULLI link=logit;
   random int / sub=ID;
   output out=pred pred=p pred(ILINK )=pp pred(NOBLUP ILINK )=marmean STDERR(NOBLUP ILINK )=se LCL(NOBLUP ILINK )=lcl        UCL(NOBLUP ILINK )=Ucl;
	store results;
run;
proc plm restore=results;
score data=scoreX out=predict predicted lclm uclm / ilink;
run;
proc print data=predict;run;
proc sgplot data=predict;
series x=x y=predicted / lineattrs=(color=red thickness=2);
series x=x y=lclm / lineattrs=(color=blue);
series x=x y=uclm / lineattrs=(color=blue);
run;

BlueNose · Posted 10-14-2015 12:26 AM

Thank you both for replying.

Steve, I will try to explain better. I ran the logistic model with only a few values of explanatory variable. In nature it is continous, but my data has only a few possible values of this variable. I wish to use the coefficients to predict the probability of success for every value between 10 to 50. This is done easily like I did in my code, but as for the CI's, SAS will only give me CI's for the predictions of my data, i.e. for the values i have in the data, and since I don't have enough of them, I can't plot them. I do get a nice sigmoid for the predictions.

If I use your formula (a CI for proportion using a normal approximation), do I need to have at least 30 subjects or at least 30 obesrvations (the reason I used GLIMMIX is because every subject had several observations, with different levels of X).

Ivm,

your code ran, I will check to see if the results makes any sense, if so, then I have learned something new, this is the first time I hear of proc plm...I still don't know how it works or what you did, but I do have 3 curves !

SteveDenham · Posted 10-14-2015 08:32 AM

Since you are coming up with "hypothetical" values, based on pre-calculated levels, the N needed is entirely up to you. Check your current work, and use the degrees of freedom for the estimates currently calculated. That should give the most relevant confidence bounds.

Steve Denham

lvm · Posted 10-14-2015 09:17 AM

Steve's formula is appropriate for simple random sampling. Because you also have a random effect, you couldn't use this for the SE (variance) in your case.

SteveDenham · Posted 10-14-2015 12:55 PM

Unfortunate, but true. However, you could kluge together the residual variance and ID variance component into total variance and plug that in. Starting to get a bit dodgy, though, pulling the variability from the observed part, and applying it to the "assumed". You could add a set of overdispersion factors to the formula I gave to account for the subject to subject variability, and generate a family of confidence intervals around the curve you are generating. That sounds like the law of diminishing returns may be starting to set in, though--too much extra work for too little additional insight.

Steve Denham

lvm · Posted 10-15-2015 09:22 AM

PLM will take care of all those tedious issues for the CI.

lvm · Posted 10-14-2015 09:14 AM

Proc PLM is designed to do several things after first fittihg a model. Scoring is what you want, and it is what I showed. The STORE command in GLIMMIX puts all the necessary parts in an item store, which is then accessed using RESTORE in PLM. For scoring, you need to have the predictor variables of interest in another file; the predictions, confidence limits, etc., are all put in an output file for viewing or other processing.

SteveDenham · Posted 10-16-2015 08:31 AM

Larry, I think we are answering different questions here. I agree that PLM is the way to go for the results of the analysis. However, I interpreted part of the question to be--"How do I plot a curve of probabilities and their confidence bounds--independent of any data?" That's what I see in the DATA step. Coming up with the probabilities is the easy part, and will be the same for all studies. Coming up with the confidence bounds is more difficult. My first reply is for the simple random situation, where the variance of a probability is a function of the probability estimate. Later replies address how to accommodate other sources of variability. I was aiming for a general solution that was independent of any estimation procedures or data. Still working on how to get it right, but I will wager I can find it in either Stroup or Zuur's books.

Steve Demja,