06-13-2014 12:38 PM
I am running a logistic regression model with two binary categorical variables, their multiplicative interaction term, and a continuous variable. I am currently trying to get the differences in probabilities (along with respective confidence interval(CI)) for different combinations of independent variabls (e.g., for a strata) in an attempt to plot them with their CI. My goal is to generate graphs similar to those at the bottom of the following webpage:
My data is very similar to those in the STATA example, or what I mean is I have an X1, X2, X1*X2, and X3 (continuous variable).
Using contrast statements in my proc logistic, I can get the probabilities for select groupings of my variables based on selected level of the continuous variable (see attached). However, I want to get the differences (and respective CI) of these based on the level of one of the categorical variables. The difference is basic subtraction, but I don't know how to get the CIs for them. Seems like something that can be done in a contrast statement, but I can't seem to get my code right, since based on simple subtraction, I know what the difference should be. I have tried things like "contrast X1 1 X1*X2 1", which are terms not in both contrast statements, to no avail.
Thanks in advance for your time and help!
proc logistic data=dataset;
class X1 (ref='0')
X2 (ref='0') / param=ref;
model y(event='1') =
X3 /*continuous*/ / link = logit expb clodds=wald;
contrast 'X1 = 0 X2 = 1' X2 1 X3 11.23 intercept 1 /estimate = prob; /*11.23 mean of X3*/
contrast 'X1 = 1 X2 = 1' X1 1 X2 1 X3 11.23 intercept 1 X1*X2 1 /estimate = prob;
contrast 'X1 = 0 X2 = 0' X3 11.23 intercept 1 /estimate = prob;
contrast 'X1 = 1 X2 = 0' X1 1 X3 11.23 intercept 1 /estimate = prob;
/*Want difference of first and second contrast plus confidence interval*/
/*Want difference of third and fourth contrast plus confidence interval*/
06-16-2014 08:42 AM
The problem is that the standard error of the difference is on the logit scale. You may wish to try the LSMESTIMATE statement to get the comparison of interest (available in SAS/STAT12.3 and above). I won't guarantee that the standard error is quite right. Also, beware of using the ILINK option in this statement. However, if you get the CL estimates out, you could thne transform back to the probability scale.
06-18-2014 12:54 PM
I feel like I can get close to the results I am looking for but not quite there. The following gets me the individual probabilites needed for the subtraction:
lsmeans X1*X2 / at X3 = 11.23 ilink;
when I add "diff" as an option it calculates the differences, though these are differences in the beta coefficients, which when exponentiated and divided by 1 + them exponentiated, do not equal the same thing as differences in transformed values (probabilities) I am after (along with confidence intervals). Any suggestions would be appreciated!
P.S., I could always pull out the probabilities and SE needed for the subtraction and perform the procedure in a data step, however - I do not know how to get the confidence intervals for this calculated difference based only on the probabilities and their respective SE.
06-18-2014 01:25 PM
If I just have the probabilities and SEs and want to find the difference between two of these probabilities with its confidence interval - is it just possible to do the following:
~ find the differences
~ square the individual SEs, then add them together and find the square root for the difference SE
~ calculate confindence intervals via difference +/- (SE * 1.96)
06-18-2014 02:55 PM
I got excited at first about plugging in a t value for the CI, rather than the z value, but I gathered my wits before I hit "Add Reply". Since this is for a proportion, and is an approximation in any case, you are probably good. It does assume that the estimates are independent. If you really wanted to get picky, you could get the covariance matrix out of LOGISTIC and apply that when calculating the pooled SE.
06-18-2014 04:27 PM
Well I am on a stats forum, so lets please hear about the picky method. I had the code output the cov, which seems fairly splotchy with values just for the reference values of X1, X2, X1*X2, and for X3.
06-19-2014 10:01 AM
Well, the asymptotic variance of a difference between non-independent variables A and B is var(diff) = var(A) + var (B) + 2*cov(A,B). So, it is just a matter of plugging in the covariance term after you sum the squared SE's, and before taking the square root. If values are missing from cov, then you might be dealing with an atypical dataset.
06-19-2014 02:09 PM
I would try to do this by using PROC NLMIXED. Here is a good reference which shows how to do:
06-20-2014 10:03 AM
Yes, I am excited to try this approach (which is oddly enough is similar to another piece of code I have further down in my program). I am caught up in doing something else right now, so I will update this thread if I end up having difficulties or with the final code used once I run it.
Thanks again every one - this is the third question I have posted on this forum, with great responses everytime!
P.S., Unsure why I did not find that SAS document during my searches
06-20-2014 12:46 PM
Well it turns out I tried to go this route about 3 weeks ago to no avail. I kept botching things up when I added the continuous variable. Perhaps it is that I am not adding it correctly. So for example, I realize the following code is off but it would be something like this (see below), but the continuous variable is being treated as a binary variable.
Ideally I want to run this and find the differences between the probabilities plus confidence intervals given specified values for the continuous variable (e.g., 1, 5, 10, 15,...). Any help would be appreciated.
proc nlmixed data=test;
p=1/(1+exp(-(Intercept + b1*x1 + b2*x2 = B3*x1*x2 + B4*x3)));/*B4 Continuous/
model y ~ binomial(1,p);
estimate "Prob Diff x1 = 1 and X2 = 1 or 0" 1/(1+exp(-(Intercept+b1 + b2 +b3 + b4*x4))) - 1/(1+exp(-(Intercept + B1*X1 + B4)));
06-20-2014 02:00 PM
First, check to see if you correctly estimate for x1=1 and x2=1. Because of the continuous nature of x3, you will need to feed in a value. Say the mean value of x3 is 12. Then
estimate 'Prob for x1=1, x2=1, x3=12' 1/(1+exp(-(intercept + b1*1 + b2*1 + b3*1 +b4*12)));
Do similar estimates for the other probabilities to make sure you are getting what you believe you should be getting, and THEN create an ESTIMATE statement that combines/takes the difference of two that are correctly estimated.
06-20-2014 03:09 PM
It runs fine until I add the continuous variable, then halts and the output stops prior to the "Interation History" with the following log message:
NOTE: To assign starting values to parameters, use the PARMS statement. The default starting
value of 1.0 is in effect for all parameters.
ERROR: No valid parameter points were found.
NOTE: PROCEDURE NLMIXED used (Total process time):
real time 0.14 seconds
cpu time 0.06 seconds
This is the output:
The NLMIXED Procedure
Distribution for Dependent Variable
Observations Not Used
I had originally attempt to use this approach via:
Thanks, any help is appreciated.
Once again, my original question was how to find the difference between two probabilities and the respective confidence interval for different levels of a continuous covariate only given the two probabilities their SE, which can be calculated for the continuous covariate levels. However, I am will to explore all options for the a solution.
06-20-2014 03:16 PM
Scale and center your continous variable, so that the values will not "blow up" when exponentiated. Also, the PARMS statement can be used to grid search for good starting values. I don't know hwat scale the continuous variable is on, but if it is recentered and rescaled so that values run from, say, -5 to +5, you should not need the PARMS approach.
If you want to keep the original scaling, and x3 ranges from 1 to 100, for example, then
parms b4 0.01 to 0.2 by 0.1;
might be a place to start.
06-25-2014 01:45 PM
Did you get it working? I rarely get NLMIXED programs to work the first time...
If you haven't gotten it working yet: you have an "=" in the equation for "p" that does not belong there:
p=1/(1+exp(-(Intercept + b1*x1 + b2*x2 + B3*x1*x2 + B4*x3)));/*B4 Continuous/
For logistic programs, I've found small intercepts and zero slopes give decent starting values:
parms8 b1=0 b2=0 b3=0 b4=0;
but if that doesn't work, setting up a grid as Steve suggested (away from 1) should help.
And here's my version of your data + a program that actually converges (but I'm not gonna claim it's correct ):
do i=1 to 1000;
lp= 0.1 + 0.2*a + 0.3*b + 0.4*a*b + 0.005*x;
proc logistic data=one;
class a b / param=ref ref=first;
model y(event='1')= a b a*b x / link = logit itprint;
* NLMIXED and LOGISTIC parameter estimates match ;
proc nlmixed data=one gconv=0 fconv=0;
parms int=1e-8 ba=0 bb=0 bab=0 bx=0;
model y ~ binary(elp/(1+elp));
estimate 'p1' p1;
estimate 'p4' p4;
estimate 'p1-p4' p1-p4;