BookmarkSubscribeRSS Feed
H
Pyrite | Level 9 H
Pyrite | Level 9

I am running a logistic regression model with two binary categorical variables, their multiplicative interaction term, and a continuous variable.  I am currently trying to get the differences in probabilities (along with respective confidence interval(CI)) for different combinations of independent variabls (e.g., for a strata) in an attempt to plot them with their CI. My goal is to generate graphs similar to those at the bottom of the following webpage:

http://www.ats.ucla.edu/stat/stata/faq/logit2by2.htm

My data is very similar to those in the STATA example, or what I mean is I have an X1, X2, X1*X2, and X3 (continuous variable).

Using contrast statements in my proc logistic, I can get the probabilities for select groupings of my variables based on selected level of the continuous variable (see attached). However, I want to get the differences (and respective CI) of these based on the level of one of the categorical variables. The difference is basic subtraction, but I don't know how to get the CIs for them. Seems like something that can be done in a contrast statement, but I can't seem to get my code right, since based on simple subtraction, I know what the difference should be. I have tried things like "contrast X1 1 X1*X2 1", which are terms not in both contrast statements, to no avail.

Thanks in advance for your time and help!

proc logistic data=dataset;

      class       X1 (ref='0')

               X2 (ref='0') / param=ref;

      model       y(event='1') =

                  X1

                  X2 

                  X1*X2

                  X3 /*continuous*/ / link = logit expb clodds=wald;

      contrast 'X1 = 0 X2 = 1' X2 1 X3 11.23 intercept 1  /estimate = prob; /*11.23 mean of X3*/

      contrast 'X1 = 1 X2 = 1' X1 1 X2 1 X3 11.23 intercept 1 X1*X2 1 /estimate = prob;

      contrast 'X1 = 0 X2 = 0' X3 11.23 intercept 1 /estimate = prob;

      contrast 'X1 = 1 X2 = 0' X1 1 X3 11.23 intercept 1  /estimate = prob;

/*Want difference of first and second contrast plus confidence interval*/

/*Want difference of third and fourth contrast plus confidence interval*/

run;

20 REPLIES 20
SteveDenham
Jade | Level 19

The problem is that the standard error of the difference is on the logit scale.  You may wish to try the LSMESTIMATE statement to get the comparison of interest (available in SAS/STAT12.3 and above).  I won't guarantee that the standard error is quite right. Also, beware of using the ILINK option in this statement.  However, if you get the CL estimates out, you could thne transform back to the probability scale.

Steve Denham

H
Pyrite | Level 9 H
Pyrite | Level 9

Steve Denham,

I feel like I can get close to the results I am looking for but not quite there. The following gets me the individual probabilites needed for the subtraction:

lsmeans X1*X2 / at X3 = 11.23 ilink;

when I add "diff" as an option it calculates the differences, though these are differences in the beta coefficients, which when exponentiated and divided by 1 + them exponentiated, do not equal the same thing as differences in transformed values (probabilities) I am after (along with confidence intervals).  Any suggestions would be appreciated!

P.S., I could always pull out the probabilities and SE needed for the subtraction and perform the procedure in a data step, however - I do not know how to get the confidence intervals for this calculated difference based only on the probabilities and their respective SE.

H
Pyrite | Level 9 H
Pyrite | Level 9

Additional option,

If I just have the probabilities and SEs and want to find the difference between two of these probabilities with its confidence interval - is it just possible to do the following:

~ find the differences

~ square the individual SEs, then add them together and find the square root for the difference SE

~ calculate confindence intervals via difference +/- (SE * 1.96)

SteveDenham
Jade | Level 19

I got excited at first about plugging in a t value for the CI, rather than the z value, but I gathered my wits before I hit "Add Reply".  Since this is for a proportion, and is an approximation in any case, you are probably good.  It does assume that the estimates are independent.  If you really wanted to get picky, you could get the covariance matrix out of LOGISTIC and apply that when calculating the pooled SE.

Steve Denham

H
Pyrite | Level 9 H
Pyrite | Level 9

Well I am on a stats forum, so lets please hear about the picky method. I had the code output the cov, which seems fairly splotchy with values just for the reference values of X1, X2, X1*X2, and for X3.

SteveDenham
Jade | Level 19

Well, the asymptotic variance of a difference between non-independent variables A and B is var(diff) = var(A) + var (B) + 2*cov(A,B).  So, it is just a matter of plugging in the covariance term after you sum the squared SE's, and before taking the square root.  If values are missing from cov, then you might be dealing with an atypical dataset.

And of course, you could do all of this in matrix form.  Search 's blog The DO Loop for an example.

Steve Denham

Funda_SAS
SAS Employee

I would try to do this by using PROC NLMIXED. Here is a good reference which shows how to do:

37228 - Estimating differences in probabilities with confidence interval

Funda

SteveDenham
Jade | Level 19

That is cool, and definitely qualifies as A. a better approach, and B. picky, so it certainly meets all of the qualifications.  Thanks,

Steve Denham

H
Pyrite | Level 9 H
Pyrite | Level 9

Yes, I am excited to try this approach (which is oddly enough is similar to another piece of code I have further down in my program). I am caught up in doing something else right now, so I will update this thread if I end up having difficulties or with the final code used once I run it.

Thanks again every one - this is the third question I have posted on this forum, with great responses everytime!

P.S., Unsure why I did not find that SAS document during my searches Smiley Happy

H
Pyrite | Level 9 H
Pyrite | Level 9

Well it turns out I tried to go this route about 3 weeks ago to no avail. I kept botching things up when I added the continuous variable. Perhaps it is that I am not adding it correctly. So for example, I realize the following code is off but it would be something like this (see below), but the continuous variable is being treated as a binary variable.

Ideally I want to run this and find the differences between the probabilities plus confidence intervals given specified values for the continuous variable (e.g., 1, 5, 10, 15,...). Any help would be appreciated.

proc nlmixed data=test;

  p=1/(1+exp(-(Intercept + b1*x1 + b2*x2 = B3*x1*x2 + B4*x3)));/*B4 Continuous/

    model y ~ binomial(1,p);

    estimate "Prob Diff x1 = 1 and X2 = 1 or 0" 1/(1+exp(-(Intercept+b1 + b2 +b3 + b4*x4))) - 1/(1+exp(-(Intercept + B1*X1 + B4)));

run;

SteveDenham
Jade | Level 19

First, check to see if you correctly estimate for x1=1 and x2=1.  Because of the continuous nature of x3, you will need to feed in a value.  Say the mean value of x3 is 12.  Then

estimate 'Prob for x1=1, x2=1, x3=12' 1/(1+exp(-(intercept + b1*1 + b2*1 + b3*1 +b4*12)));

Do similar estimates for the other probabilities to make sure you are getting what you believe you should be getting, and THEN create an ESTIMATE statement that combines/takes the difference of two that are correctly estimated.

Steve Denham

H
Pyrite | Level 9 H
Pyrite | Level 9

It runs fine until I add the continuous variable, then halts and the output stops prior to the "Interation History" with the following log message:

 

NOTE: To assign starting values to parameters, use the PARMS statement. The default starting

value of 1.0 is in effect for all parameters.

ERROR: No valid parameter points were found.

NOTE: PROCEDURE NLMIXED used (Total process time):

real time 0.14 seconds

cpu time 0.06 seconds

This is the output:

The NLMIXED Procedure

Data Set

  1. WORK.TEST

Dependent Variable

y

Distribution for Dependent Variable

Binomial

Optimization Technique

Dual Quasi-Newton

Integration Method

None

Observations Used

319

Observations Not Used

0

Total Observations

319

Parameters

5

1

1

1

1

1

  1. 1.15792E77

I had originally attempt to use this approach via:

http://www.columbia.edu/~mmw2177/Testing%20for%20additive%20interactions%2029Feb2012.pdf

Thanks, any help is appreciated.


Once again, my original question was how to find the difference between two probabilities and the respective confidence interval for different levels of a continuous covariate only given the two probabilities their SE, which can be calculated for the continuous covariate levels. However, I am will to explore all options for the a solution.

SteveDenham
Jade | Level 19

Scale and center your continous variable, so that the values will not "blow up" when exponentiated.  Also, the PARMS statement can be used to grid search for good starting values.  I don't know hwat scale the continuous variable is on, but if it is recentered and rescaled so that values run from, say, -5 to +5, you should not need the PARMS approach.

If you want to keep the original scaling, and x3 ranges from 1 to 100, for example, then

parms b4 0.01 to 0.2 by 0.1;

might be a place to start.

Steve Denham

bobderr
SAS Employee

Did you get it working?  I rarely get NLMIXED programs to work the first time...

If you haven't gotten it working yet:  you have an "=" in the equation for "p" that does not belong there:

      p=1/(1+exp(-(Intercept + b1*x1 + b2*x2    +    B3*x1*x2 + B4*x3)));/*B4 Continuous/

For logistic programs, I've found small intercepts and zero slopes give decent starting values:

     parms8 b1=0 b2=0 b3=0 b4=0;

but if that doesn't work, setting up a grid as Steve suggested (away from 1) should help.

And here's my version of your data + a program that actually converges (but I'm not gonna claim it's correct Smiley Happy):

data one;
  do i=1 to 1000;
    a=round(ranuni(0));
    b=round(ranuni(0));
    x=11.23+rannor(0);
    lp= 0.1 + 0.2*a + 0.3*b + 0.4*a*b + 0.005*x;
    p=exp(lp)/(1+exp(lp));
    y= (ranuni(0)<p);
    output;
  end;
run;

proc logistic data=one;
  class a b / param=ref ref=first;
  model y(event='1')= a b a*b x / link = logit itprint;
run;
* NLMIXED and LOGISTIC parameter estimates match ;
proc nlmixed data=one gconv=0 fconv=0;
  parms int=1e-8 ba=0 bb=0 bab=0 bx=0;
  elp=exp(int+ba*a+bb*b+bab*a*b+bx*x);
  model y ~ binary(elp/(1+elp));
  xfixed=11.23;
  ea0b0=exp(-(int          +bx*xfixed));
  ea1b1=exp(-(int+ba+bb+bab+bx*xfixed));
  p1=ea0b0/(1+ea0b0);
  p4=ea1b1/(1+ea1b1);
  estimate 'p1' p1;
  estimate 'p4' p4;
  estimate 'p1-p4' p1-p4;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 5441 views
  • 9 likes
  • 6 in conversation