Solved: Re: intertpreting estimates from zero-truncated negative binomial regr...

amyip · Posted 08-22-2024 10:59 AM

I am wondering how to interpret estimates from proc nlmixed or proc fmm for zero-truncated negative binomial regression of total cumulative days' supply of a medication class on long-term care (yes/no) after adjusting for other covariates. Using the code below provided by StatDave (thank you!) in another post, I obtain the estimate of 0.2757 (95% CI 0.2254, 0.3261), p<0.0001. Is it appropriate to report the corresponding e^β of 1.32 (95% CI 1.26, 1.39) as a pseudo relative risk, such that long-term care residents were prescribed 1.32x the total days' supply of medication as their community-dwelling counterparts? if so, is there code to include to have these generated with the output?

      proc nlmixed data=ZTNB;
         y=CumDays;
         mean = exp(intercept + b_long_term_care*long_term_care + b_female*female + b_agegr1*agegr1);
         ll = lgamma(y+1/k) - lgamma(1/k) - lgamma(y+1) + y*log(k*mean) -
              (y+(1/k))*log(1+k*mean) - log(1-(1+k*mean)**(-1/k));
         model y ~ general(ll);
         run;

      proc fmm data=ZTNB;
         model y = long_term_care female agegr1 / dist=truncnegbin;
         run;

StatDave · Posted 08-22-2024 02:01 PM

No, that would not be quite correct because the mean of the truncated negative binomial distribution is not equal to exp(L*beta). See the "Details: Log-Likelihood Functions for Response Distributions" section of the FMM documentation which shows that the mean of the distribution is mu/(1-(k*mu+1)**-1/k), where mu=exp(L*beta) and k is the dispersion parameter. (I probably should not have used "mean" to name the exp(L*beta) expression in the note I referred to - I use "mu" below.) If you want to estimate the mean for each level of your long_term predictor at fixed settings of the covariates and then estimate the ratio of those means, this can be done using ESTIMATE statements in NLMIXED like I showed before. Again using the ZTNB example in that note, the following adds the binary FEM variable in the model and then estimates the response means for the FEM levels with MENT fixed at 10, and then the ratios of those means. The PREDICT statement creates a data set of predicted means for all the observations which could be used for plotting purposes.

      proc nlmixed data=Long97data;
         where art ne 0;
         y=art;
         parms b0=1 b1=0 k=1;
         mu = exp(b0 + b1*ment + b2*fem);
         ll = lgamma(y+1/k) - lgamma(1/k) - lgamma(y+1) + y*log(k*mu) -
              (y+(1/k))*log(1+k*mu) - log(1-(1+k*mu)**(-1/k));
         model y ~ general(ll);
         mean=mu/(1-(1+k*mu)**(-1/k));
         predict mean out=out;
         mu1=exp(b0 + b1*10 + b2);
         mu0=exp(b0 + b1*10);
         mean1=mu1/(1-(1+k*mu1)**(-1/k));
         mean0=mu0/(1-(1+k*mu0)**(-1/k));
         estimate 'mean fem=1 at ment=10' mean1;
         estimate 'mean fem=0 at ment=10' mean0;
         estimate 'ratio of mean counts: fem1/fem0' mean1/mean0;
         run;

View solution in original post

StatDave · Posted 08-22-2024 02:01 PM

No, that would not be quite correct because the mean of the truncated negative binomial distribution is not equal to exp(L*beta). See the "Details: Log-Likelihood Functions for Response Distributions" section of the FMM documentation which shows that the mean of the distribution is mu/(1-(k*mu+1)**-1/k), where mu=exp(L*beta) and k is the dispersion parameter. (I probably should not have used "mean" to name the exp(L*beta) expression in the note I referred to - I use "mu" below.) If you want to estimate the mean for each level of your long_term predictor at fixed settings of the covariates and then estimate the ratio of those means, this can be done using ESTIMATE statements in NLMIXED like I showed before. Again using the ZTNB example in that note, the following adds the binary FEM variable in the model and then estimates the response means for the FEM levels with MENT fixed at 10, and then the ratios of those means. The PREDICT statement creates a data set of predicted means for all the observations which could be used for plotting purposes.

      proc nlmixed data=Long97data;
         where art ne 0;
         y=art;
         parms b0=1 b1=0 k=1;
         mu = exp(b0 + b1*ment + b2*fem);
         ll = lgamma(y+1/k) - lgamma(1/k) - lgamma(y+1) + y*log(k*mu) -
              (y+(1/k))*log(1+k*mu) - log(1-(1+k*mu)**(-1/k));
         model y ~ general(ll);
         mean=mu/(1-(1+k*mu)**(-1/k));
         predict mean out=out;
         mu1=exp(b0 + b1*10 + b2);
         mu0=exp(b0 + b1*10);
         mean1=mu1/(1-(1+k*mu1)**(-1/k));
         mean0=mu0/(1-(1+k*mu0)**(-1/k));
         estimate 'mean fem=1 at ment=10' mean1;
         estimate 'mean fem=0 at ment=10' mean0;
         estimate 'ratio of mean counts: fem1/fem0' mean1/mean0;
         run;

amyip · Posted 08-26-2024 12:52 PM

Thanks again for taking the time to follow up. I tried running the model with the estimate statements and got the message:

ERROR: Estimate statement expressions are not allowed to depend on variables in the input data set.

StatDave · Posted 08-26-2024 01:25 PM

In the example I showed, you can see that the ESTIMATE statements depend only on the model parameters, b0, b1, b2, and k. The PREDICT statement allows the quantity specified in that statement to contain data values. So, the PREDICT statement that I showed uses the data values in each observation to estimate the "mean" expression which depends on the data values of MENT and FEM as well as the model parameters.

amyip · Posted 08-26-2024 03:56 PM

My model includes mostly binary covariates, with dummy variables for 6 age categories and 22 counties; is it correct to omit the referent category for age (include 5 levels) and county (include 21 levels), or should binary dummies for all levels be included?

I ran the SAS code that you provided, substituting in my variable names. For the mu1 model I included the term b_long_term_care*1, whereas for the mu0 model I excluded it (b_long_term_care*0), and in both I assigned a value of 1 for one of my covariates (b_dementia*1). The following lines were used verbatim from your example, except for the text in quotation marks which names my variables:

mean1=mu1/(1-(1+k*mu1)**(-1/k));
mean0=mu0/(1-(1+k*mu0)**(-1/k));
estimate 'mean fem=1 at ment=10' mean1;
estimate 'mean fem=0 at ment=10' mean0;
estimate 'ratio of mean counts: fem1/fem0' mean1/mean0;

StatDave · Posted 08-26-2024 06:07 PM

Yes, you only need to include k-1 dummies for a categorical variable with k levels. In my example, FEM is a single dummy variable representing gender that equals 1 if female, 0 otherwise. The mu1 and mu0 expressions would need to include only parameter names and not any variable names from the data set like in my example.

amyip · Posted 08-27-2024 02:38 PM

 proc nlmixed data=ZTNB;
  y = CumDays;
  mu = exp(intercept + b_long_term_care*long_term_care + b_female*female + b_dementia*dementia);
  ll = lgamma(y+1/k) - lgamma(1/k) - lgamma(y+1) + y*log(k*mu) - (y+(1/k))*log(1+k*mu) - log(1-(1+k*mu)**(-1/k));
 model y ~ general(ll);
 mean = mu/(1-(1+k*mu)**(-1/k));
 predict mean out=out;
 mu1=exp(intercept + b_long_term_care*1 + b_female*female + b_dementia*1);
 mu0=exp(intercept + b_female*female + b_dementia*1);
 mean1 = mu1/(1-(1+k*mu1)**(-1/k));
 mean0 = mu0/(1-(1+k*mu0)**(-1/k));
 estimate 'mean LTC=1 at dementia=1' mean1;
 estimate 'mean LTC=0 at dementia=1' mean0;
 estimate 'ratio of mean counts: LTC1/LTC0' mean1/mean0;
 run;

This is the code than I ran (full list of model covariates omitted for simplicity) that yielded ERROR: Estimate statement expressions are not allowed to depend on variables in the input data set. I omitted "where CumDays ne 0" because this already applies to all my observations, and the PARMS statement hence the starting value for all parameters is 1. Can you identify the source of the error? Thanks for your time

StatDave · Posted 08-27-2024 04:17 PM

mean0 and mean1 in the ESTIMATE statements are functions of mu0 and mu1 which you have defined as functions of both parameters and data variables. Hence the error.

amyip · Posted 08-28-2024 12:59 PM

I'm beginning to understand -- thank you for your patience!

1-Do I need to include the PARMS statement to set the starting estimate values at something other than 1? In your example, you set b1 at 0 (for the variable MENT).

2-Is it ok to include the parameters as intercept, b_long_term_care, b_female, b_dementia, etc., rather than numbered b0, b1, b2? I have numerous covariates (22 dummies for county) so naming is less confusing than numbering.

3-In your example, you define mu = exp(b0 + b1*ment + b2*fem); ment and fem are data variables, aren't they? is it just mu0 and mu1 that must be functions of parameters and not variables? in which case I would include just the parameters with the values all set at 1 (or 0), correct?

Thanks again for your time.

amyip · Posted 08-28-2024 05:10 PM

You can disregard my earlier post. I corrected my definitions of mu1 and mu0 to depend only on parameters with no variables as follows:

mu1=exp(intercept + b_long_term_care*1 + b_female + b_dementia);
mu0=exp(intercept + b_female + b_dementia);

which yielded the same results as the original model with the additional estimates of mean1= 9.7090, mean0=7.7925, and mean1/mean0=1.2459 with corresponding p-value, lower/upper limits, etc. Can this ratio now be interpreted that, for subjects with all other covariates equal (do they all have to take the value 1?), those in LTC were prescribed on average 1.25x the days' supply as their community-dwelling counterparts? Thanks again for your guidance.

StatDave · Posted 08-28-2024 05:49 PM

It means that the mean of your response variable, specifically in the LTC=1, FEMALE=1, DEMENTIA=1 group, is 1.25 times the mean of the response in the same group but with LTC=0.

amyip · Posted 08-28-2024 08:33 PM

Thank you so much!

amyip · Posted 08-29-2024 10:55 AM

A follow-up question -- as mentioned I have 6 age groups and 22 counties, so k-1 dummies were included in my model. In defining mu1 and mu0, I should only include one level for age group and county rather than all dummies, correct? Alternatively, if I omit them all, my estimates would be based on these variables taking on the referent value? To simplify, could I define mu1 and m0 as follows:

mu1=exp(intercept + b_long_term_care);
mu0=exp(intercept);

Where the definition of mu includes all the covariates, would mu1 and mu0 assign them all a value of 0 except for mu1 assigning LTC as 1?

intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Re: intertpreting estimates from zero-truncated negative binomial regression

Registration is open