## Difference and CI for probabilities in Logistic

Frequent Contributor
Posts: 116

# Difference and CI for probabilities in Logistic

I am running a logistic regression model with two binary categorical variables, their multiplicative interaction term, and a continuous variable.  I am currently trying to get the differences in probabilities (along with respective confidence interval(CI)) for different combinations of independent variabls (e.g., for a strata) in an attempt to plot them with their CI. My goal is to generate graphs similar to those at the bottom of the following webpage:

http://www.ats.ucla.edu/stat/stata/faq/logit2by2.htm

My data is very similar to those in the STATA example, or what I mean is I have an X1, X2, X1*X2, and X3 (continuous variable).

Using contrast statements in my proc logistic, I can get the probabilities for select groupings of my variables based on selected level of the continuous variable (see attached). However, I want to get the differences (and respective CI) of these based on the level of one of the categorical variables. The difference is basic subtraction, but I don't know how to get the CIs for them. Seems like something that can be done in a contrast statement, but I can't seem to get my code right, since based on simple subtraction, I know what the difference should be. I have tried things like "contrast X1 1 X1*X2 1", which are terms not in both contrast statements, to no avail.

proc logistic data=dataset;

class       X1 (ref='0')

X2 (ref='0') / param=ref;

model       y(event='1') =

X1

X2

X1*X2

X3 /*continuous*/ / link = logit expb clodds=wald;

contrast 'X1 = 0 X2 = 1' X2 1 X3 11.23 intercept 1  /estimate = prob; /*11.23 mean of X3*/

contrast 'X1 = 1 X2 = 1' X1 1 X2 1 X3 11.23 intercept 1 X1*X2 1 /estimate = prob;

contrast 'X1 = 0 X2 = 0' X3 11.23 intercept 1 /estimate = prob;

contrast 'X1 = 1 X2 = 0' X1 1 X3 11.23 intercept 1  /estimate = prob;

/*Want difference of first and second contrast plus confidence interval*/

/*Want difference of third and fourth contrast plus confidence interval*/

run;

Posts: 2,655

## Re: Difference and CI for probabilities in Logistic

The problem is that the standard error of the difference is on the logit scale.  You may wish to try the LSMESTIMATE statement to get the comparison of interest (available in SAS/STAT12.3 and above).  I won't guarantee that the standard error is quite right. Also, beware of using the ILINK option in this statement.  However, if you get the CL estimates out, you could thne transform back to the probability scale.

Steve Denham

Frequent Contributor
Posts: 116

## Re: Difference and CI for probabilities in Logistic

Steve Denham,

I feel like I can get close to the results I am looking for but not quite there. The following gets me the individual probabilites needed for the subtraction:

lsmeans X1*X2 / at X3 = 11.23 ilink;

when I add "diff" as an option it calculates the differences, though these are differences in the beta coefficients, which when exponentiated and divided by 1 + them exponentiated, do not equal the same thing as differences in transformed values (probabilities) I am after (along with confidence intervals).  Any suggestions would be appreciated!

P.S., I could always pull out the probabilities and SE needed for the subtraction and perform the procedure in a data step, however - I do not know how to get the confidence intervals for this calculated difference based only on the probabilities and their respective SE.

Frequent Contributor
Posts: 116

## Re: Difference and CI for probabilities in Logistic

If I just have the probabilities and SEs and want to find the difference between two of these probabilities with its confidence interval - is it just possible to do the following:

~ find the differences

~ square the individual SEs, then add them together and find the square root for the difference SE

~ calculate confindence intervals via difference +/- (SE * 1.96)

Posts: 2,655

## Re: Difference and CI for probabilities in Logistic

I got excited at first about plugging in a t value for the CI, rather than the z value, but I gathered my wits before I hit "Add Reply".  Since this is for a proportion, and is an approximation in any case, you are probably good.  It does assume that the estimates are independent.  If you really wanted to get picky, you could get the covariance matrix out of LOGISTIC and apply that when calculating the pooled SE.

Steve Denham

Frequent Contributor
Posts: 116

## Re: Difference and CI for probabilities in Logistic

Well I am on a stats forum, so lets please hear about the picky method. I had the code output the cov, which seems fairly splotchy with values just for the reference values of X1, X2, X1*X2, and for X3.

Posts: 2,655

## Re: Difference and CI for probabilities in Logistic

Well, the asymptotic variance of a difference between non-independent variables A and B is var(diff) = var(A) + var (B) + 2*cov(A,B).  So, it is just a matter of plugging in the covariance term after you sum the squared SE's, and before taking the square root.  If values are missing from cov, then you might be dealing with an atypical dataset.

And of course, you could do all of this in matrix form.  Search 's blog The DO Loop for an example.

Steve Denham

SAS Employee
Posts: 31

## Re: Difference and CI for probabilities in Logistic

I would try to do this by using PROC NLMIXED. Here is a good reference which shows how to do:

37228 - Estimating differences in probabilities with confidence interval

Funda

Posts: 2,655

## Re: Difference and CI for probabilities in Logistic

That is cool, and definitely qualifies as A. a better approach, and B. picky, so it certainly meets all of the qualifications.  Thanks,

Steve Denham

Frequent Contributor
Posts: 116

## Re: Difference and CI for probabilities in Logistic

Yes, I am excited to try this approach (which is oddly enough is similar to another piece of code I have further down in my program). I am caught up in doing something else right now, so I will update this thread if I end up having difficulties or with the final code used once I run it.

Thanks again every one - this is the third question I have posted on this forum, with great responses everytime!

P.S., Unsure why I did not find that SAS document during my searches

Frequent Contributor
Posts: 116

## Re: Difference and CI for probabilities in Logistic

Well it turns out I tried to go this route about 3 weeks ago to no avail. I kept botching things up when I added the continuous variable. Perhaps it is that I am not adding it correctly. So for example, I realize the following code is off but it would be something like this (see below), but the continuous variable is being treated as a binary variable.

Ideally I want to run this and find the differences between the probabilities plus confidence intervals given specified values for the continuous variable (e.g., 1, 5, 10, 15,...). Any help would be appreciated.

proc nlmixed data=test;

p=1/(1+exp(-(Intercept + b1*x1 + b2*x2 = B3*x1*x2 + B4*x3)));/*B4 Continuous/

model y ~ binomial(1,p);

estimate "Prob Diff x1 = 1 and X2 = 1 or 0" 1/(1+exp(-(Intercept+b1 + b2 +b3 + b4*x4))) - 1/(1+exp(-(Intercept + B1*X1 + B4)));

run;

Posts: 2,655

## Re: Difference and CI for probabilities in Logistic

First, check to see if you correctly estimate for x1=1 and x2=1.  Because of the continuous nature of x3, you will need to feed in a value.  Say the mean value of x3 is 12.  Then

estimate 'Prob for x1=1, x2=1, x3=12' 1/(1+exp(-(intercept + b1*1 + b2*1 + b3*1 +b4*12)));

Do similar estimates for the other probabilities to make sure you are getting what you believe you should be getting, and THEN create an ESTIMATE statement that combines/takes the difference of two that are correctly estimated.

Steve Denham

Frequent Contributor
Posts: 116

## Re: Difference and CI for probabilities in Logistic

It runs fine until I add the continuous variable, then halts and the output stops prior to the "Interation History" with the following log message:

NOTE: To assign starting values to parameters, use the PARMS statement. The default starting

value of 1.0 is in effect for all parameters.

ERROR: No valid parameter points were found.

NOTE: PROCEDURE NLMIXED used (Total process time):

real time 0.14 seconds

cpu time 0.06 seconds

This is the output:

The NLMIXED Procedure

 Data Set WORK.TEST Dependent Variable y Distribution for Dependent Variable Binomial Optimization Technique Dual Quasi-Newton Integration Method None

 Observations Used 319 Observations Not Used 0 Total Observations 319 Parameters 5

 1 1 1 1 1 1.15792e+77

I had originally attempt to use this approach via:

Thanks, any help is appreciated.

Once again, my original question was how to find the difference between two probabilities and the respective confidence interval for different levels of a continuous covariate only given the two probabilities their SE, which can be calculated for the continuous covariate levels. However, I am will to explore all options for the a solution.

Posts: 2,655

## Re: Difference and CI for probabilities in Logistic

Scale and center your continous variable, so that the values will not "blow up" when exponentiated.  Also, the PARMS statement can be used to grid search for good starting values.  I don't know hwat scale the continuous variable is on, but if it is recentered and rescaled so that values run from, say, -5 to +5, you should not need the PARMS approach.

If you want to keep the original scaling, and x3 ranges from 1 to 100, for example, then

parms b4 0.01 to 0.2 by 0.1;

might be a place to start.

Steve Denham

SAS Employee
Posts: 4

## Re: Difference and CI for probabilities in Logistic

Did you get it working?  I rarely get NLMIXED programs to work the first time...

If you haven't gotten it working yet:  you have an "=" in the equation for "p" that does not belong there:

p=1/(1+exp(-(Intercept + b1*x1 + b2*x2    +    B3*x1*x2 + B4*x3)));/*B4 Continuous/

For logistic programs, I've found small intercepts and zero slopes give decent starting values:

parms8 b1=0 b2=0 b3=0 b4=0;

but if that doesn't work, setting up a grid as Steve suggested (away from 1) should help.

And here's my version of your data + a program that actually converges (but I'm not gonna claim it's correct ):

data one;
do i=1 to 1000;
a=round(ranuni(0));
b=round(ranuni(0));
x=11.23+rannor(0);
lp= 0.1 + 0.2*a + 0.3*b + 0.4*a*b + 0.005*x;
p=exp(lp)/(1+exp(lp));
y= (ranuni(0)<p);
output;
end;
run;

proc logistic data=one;
class a b / param=ref ref=first;
model y(event='1')= a b a*b x / link = logit itprint;
run;
* NLMIXED and LOGISTIC parameter estimates match ;
proc nlmixed data=one gconv=0 fconv=0;
parms int=1e-8 ba=0 bb=0 bab=0 bx=0;
elp=exp(int+ba*a+bb*b+bab*a*b+bx*x);
model y ~ binary(elp/(1+elp));
xfixed=11.23;
ea0b0=exp(-(int          +bx*xfixed));
ea1b1=exp(-(int+ba+bb+bab+bx*xfixed));
p1=ea0b0/(1+ea0b0);
p4=ea1b1/(1+ea1b1);
estimate 'p1' p1;
estimate 'p4' p4;
estimate 'p1-p4' p1-p4;
run;

Discussion stats
• 20 replies
• 2147 views
• 9 likes
• 6 in conversation