09-09-2011 08:43 AM
I'd really appreciate your thoughts...
As a reminder, my response variable (y) takes on integers ranging from 1 to
30 (including both 1 and 30). I have one binary predictor (coded 0/1).
The goal is to estimate mean(y) at x=0, mean(y) at x=1, and the ratio
of those two estimates, along with 95% confidence limits.
Here are the steps I took based on very helpful feedback and online searching:
(1) Transformed my data from 1 to 30 to 0 to 1 by employing the
(2) Fudged the data very slightly such that the minimum value was
~0.0005 and the maximum value was ~0.999
(3) Employed the following NLMIXED code to fit the Beta Regression:
proc nlmixed data = mydata tech = trureg;
parms beta_xb_0 0 beta_xb_1 0 beta_wd_0 0 beta_wd_1 0;
Xb = beta_xb_0 + beta_xb_1*x;
Wd = beta_wd_0 + beta_wd_1*x;
mu = exp(Xb)/(1 + exp(Xb));
phi = exp(-Wd);
w = mu*phi;
t = (1-mu)*phi;
ll = lgamma(w+t) - lgamma(w) - lgamma(t) + (w-1)*log(y_transformed) +
estimate 'mean at x=1 on transformed scale'
exp(beta_xb_0 + beta_xb_1) / (1 + exp(beta_xb_0 + beta_xb_1));
estimate 'mean at x=0 on transformed scale'
exp(beta_xb_0) / (1 + exp(beta_xb_0));
estimate 'mean at x=1 on original scale'
29*(exp(beta_xb_0 + beta_xb_1) / (1 + exp(beta_xb_0 + beta_xb_1))) + 1;
estimate 'mean at x=0 on original scale'
29*(exp(beta_xb_0) / (1 + exp(beta_xb_0))) + 1;
model y_transformed ~ general(ll);
(1) Why are the means on the original scale derived from the ESTIMATE
statements quite different from the hand calculated means on the
(2) Since the link is seemingly logit, I presume that to obtain
appropriate 95% confidence limits, one should estimate the logit at
x=0 and the logit at x=1 and then apply the inverse logit function to
the estimate as well as upper and lower limits. Then, of course, one
would need to back transform the values (based on "y_transformed") to
the original values ("y")
(3) To obtain 95% confidence limits about the ratio of the two mean
estimates, I also assume that one should estimate:
log((mean|x=0) / (mean|x=1))
and then exponentiate the estimates as well as lower and upper limits,
as one would do to obtain confidence limits about a "relative risk."
Again, one would then need to transform these values back to the
Is there something key that I'm missing? I'd really appreciate any
other thoughts or comments on this approach.