BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BlueNose
Quartz | Level 8

Hello all,

 

I ran this logistic model with random effects using a GLMM, and using the coefficients, I have calculated the predicted probabilities for every value of X, which is a continous variable (well, in my data I only had a few values of X, but I am extrapolating). After calculating the predicted probabilities, I have used sgplot to make a sigmoid plot of X in the x-axis and the probabilities in the y axis.

 

What I couldn't do, is:

 

Calculate CI's for the probabilities, instead of one sigmoid, I want to have 3, i.e., I want to add one for the lower CI limit and one for the higher CI limit.

 

Do you know how I can obtain the CI's for the probabilities I have just calculated ?

 

Thank you !

 

 

proc glimmix data=data;
   class Id;
   model Success (event='1') = X / solution dist=BERNOULLI link=logit;
   random int / sub=ID;
   output out=pred pred=p pred(ILINK )=pp pred(NOBLUP ILINK )=marmean STDERR(NOBLUP ILINK )=se LCL(NOBLUP ILINK )=lcl        UCL(NOBLUP ILINK )=Ucl;
run;
 
data prediction;
     do X = 10 to 50;
     mu = -5 + 0.15*X;
     phat = exp(mu)/(1+exp(mu));
     output; end;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You want to use proc plm after running glimmix. Put the Xs in a data set first, run glimmix using the store=  command, then run plm to do the scoring. You get predicted and confidence limits. You either print the results or plot them. If you don't use ilink option in plm, you get the logit predictions and confidence limits.

data scoreX;
     do X = 10 to 50;
     output; end;
run;
proc glimmix data=data ;
   class Id;
   model Success (event='1') = X / solution dist=BERNOULLI link=logit;
   random int / sub=ID;
   output out=pred pred=p pred(ILINK )=pp pred(NOBLUP ILINK )=marmean STDERR(NOBLUP ILINK )=se LCL(NOBLUP ILINK )=lcl        UCL(NOBLUP ILINK )=Ucl;
	store results;
run;
proc plm restore=results;
score data=scoreX out=predict predicted lclm uclm / ilink;
run;
proc print data=predict;run;
proc sgplot data=predict;
series x=x y=predicted / lineattrs=(color=red thickness=2);
series x=x y=lclm / lineattrs=(color=blue);
series x=x y=uclm / lineattrs=(color=blue);
run;

View solution in original post

9 REPLIES 9
SteveDenham
Jade | Level 19

I think I am missing something here.  Do you want what I would call the "expected" confidence bounds?  That is, those around the probabilities generated in the datastep DATA PREDICTION. Or do you want the "resulting" confidence bounds, which should be in your output dataset PRED?

 

I suppose the easiest way to the first is to use the known standard error of a probability (=sqrt(p * (1-p) /n) and the CDF function for the normal distribution to calculate the approximate confidence bounds.  If it is the second, then fitting a spline through the values (using SGPLOT) should work.

 

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You want to use proc plm after running glimmix. Put the Xs in a data set first, run glimmix using the store=  command, then run plm to do the scoring. You get predicted and confidence limits. You either print the results or plot them. If you don't use ilink option in plm, you get the logit predictions and confidence limits.

data scoreX;
     do X = 10 to 50;
     output; end;
run;
proc glimmix data=data ;
   class Id;
   model Success (event='1') = X / solution dist=BERNOULLI link=logit;
   random int / sub=ID;
   output out=pred pred=p pred(ILINK )=pp pred(NOBLUP ILINK )=marmean STDERR(NOBLUP ILINK )=se LCL(NOBLUP ILINK )=lcl        UCL(NOBLUP ILINK )=Ucl;
	store results;
run;
proc plm restore=results;
score data=scoreX out=predict predicted lclm uclm / ilink;
run;
proc print data=predict;run;
proc sgplot data=predict;
series x=x y=predicted / lineattrs=(color=red thickness=2);
series x=x y=lclm / lineattrs=(color=blue);
series x=x y=uclm / lineattrs=(color=blue);
run;
BlueNose
Quartz | Level 8

Thank you both for replying.

 

Steve, I will try to explain better. I ran the logistic model with only a few values of explanatory variable. In nature it is continous, but my data has only a few possible values of this variable. I wish to use the coefficients to predict the probability of success for every value between 10 to 50. This is done easily like I did in my code, but as for the CI's, SAS will only give me CI's for the predictions of my data, i.e. for the values i have in the data, and since I don't have enough of them, I can't plot them. I do get a nice sigmoid for the predictions.

 

If I use your formula (a CI for proportion using a normal approximation), do I need to have at least 30 subjects or at least 30 obesrvations (the reason I used GLIMMIX is because every subject had several observations, with different levels of X).

 

Ivm,

 

your code ran, I will check to see if the results makes any sense, if so, then I have learned something new, this is the first time I hear of proc plm...I still don't know how it works or what you did, but I do have 3 curves !

 

 

SteveDenham
Jade | Level 19

Since you are coming up with "hypothetical" values, based on pre-calculated levels, the N needed is entirely up to you.  Check your current work, and use the degrees of freedom for the estimates currently calculated.  That should give the most relevant confidence bounds.

 

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Steve's formula is appropriate for simple random sampling. Because you also have a random effect, you couldn't use this for the SE (variance) in your case.

SteveDenham
Jade | Level 19

Unfortunate, but true.  However, you could kluge together the residual variance and ID variance component into total variance and plug that in.  Starting to get a bit dodgy, though, pulling the variability from the observed part, and applying it to the "assumed".  You could add a set of overdispersion factors to the formula I gave to account for the subject to subject variability, and generate a family of confidence intervals around the curve you are generating.  That sounds like the law of diminishing returns may be starting to set in, though--too much extra work for too little additional insight.

 

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

PLM will take care of all those tedious issues for the CI.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Proc PLM is designed to do several things after first fittihg a model. Scoring is what you want, and it is what I showed. The STORE command in GLIMMIX puts all the necessary parts in an item store, which is then accessed using RESTORE in PLM. For scoring, you need to have the predictor variables of interest in another file; the predictions, confidence limits, etc., are all put in an output file for viewing or other processing.

SteveDenham
Jade | Level 19

Larry, I think we are answering different questions here.  I agree that PLM is the way to go for the results of the analysis.  However, I interpreted part of the question to be--"How do I plot a curve of probabilities and their confidence bounds--independent of any data?"  That's what I see in the DATA step.  Coming up with the probabilities is the easy part, and will be the same for all studies.  Coming up with the confidence bounds is more difficult.  My first reply is for the simple random situation, where the variance of a probability is a function of the probability estimate.  Later replies address how to accommodate other sources of variability.  I was aiming for a general solution that was independent of any estimation procedures or data.  Still working on how to get it right, but I will wager I can find it in either Stroup or Zuur's books.

 

Steve Demja,

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2036 views
  • 7 likes
  • 3 in conversation