Hello,
I need to perform a Negative Binomial and Poisson distribution for a data that I have. Using GENMOD, COUNTREG and specifying distribution as NB or Poisson, I got the mean as a linear function of x Variables. However, I have my own defined Non linear function such as Hoerl and Sigmoidal function that I need to incorporate in the distribution.
For example, instead of this form.
The function I need to use is below moreover the bottom 2. Where miu can is in terms of E.
Any help would be greatly appreciated.
The PARMS statement often isn't necessary if the default starting values for the parameters are reasonable enough to allow the fitting algorithm to converge to a proper solution. But when there are fitting problems, one often needs to try various other initial values, and the PARMS statement lets you do that. If you have an idea of approximately what the final parameter values should be, such as from the same model fit to previous data, it might be worthwhile to specify them as starting values in the PARMS statement.
1/k and p are the parameters of the negative binomial distribution. The code you show is in the section of the NLMIXED documentation which shows the form of the negative binomial log likelihood function and how those parameters appear in it. The model in that code (linp) is a linear model on the log of the negative binomial mean, mu. The p parameter is related to mu and the dispersion parameter, k, as shown.
Did you check PROC GENMOD Program Statement ?
proc genmod;
class car age;
a = _MEAN_;
y = _RESP_;
d = 2 * ( y * log( y / a ) - ( y - a ) );
variance var = a;
deviance dev = d;
model c = car age / link = log offset = ln;
run;
The variables var and dev are dummy variables used internally by the procedure to identify the variance and
deviance functions. Any valid SAS variable names can be used.
Similarly, the log link function and its inverse could be defined with the FWDLINK and INVLINK statements,
as follows:
fwdlink link = log(_MEAN_);
invlink ilink = exp(_XBETA_);
Assuming that you mean you want to specify a certain nonlinear model to fit to a response that is distributed negative binomial, you will need to use PROC NLMIXED. There, you can specify both the log likelihood for the negative binomial and whatever linear or nonlinear model you want. See Note2 at the end of this note which shows the statements needed to define the negative binomial log likelihood in NLMIXED.
The PARMS statement often isn't necessary if the default starting values for the parameters are reasonable enough to allow the fitting algorithm to converge to a proper solution. But when there are fitting problems, one often needs to try various other initial values, and the PARMS statement lets you do that. If you have an idea of approximately what the final parameter values should be, such as from the same model fit to previous data, it might be worthwhile to specify them as starting values in the PARMS statement.
1/k and p are the parameters of the negative binomial distribution. The code you show is in the section of the NLMIXED documentation which shows the form of the negative binomial log likelihood function and how those parameters appear in it. The model in that code (linp) is a linear model on the log of the negative binomial mean, mu. The p parameter is related to mu and the dispersion parameter, k, as shown.
Since you are new to PROC NLMIXED, here are two elementary examples. Regarding ways to choose an initial guess for the parameters, see "The method of moments: A smart way to choose initial parameters for MLE"
for the following codes where I have not specified TOT as a data set as it is one of the variables, can you let me know why do I get the warning below:
892 proc NLMIXED data =SPFU3ST;
893 parms k=0.8;
894 Y= 5*365*((MINAADT)**beta_1)* ((MAXAADT)**beta_2)*(EXP(beta_0));
895 model TOT ~ NEGBIN (1/k, Y);
896 predict TOT out = TOT;
897 run;
NOTE: The parameters beta_1, beta_2, beta_0 are assigned the default starting value of 1.0, because
they are not assigned initial values with the PARMS statement.
ERROR: No valid parameter points were found.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TOT may be incomplete. When this step was stopped there were 0
observations and 0 variables.
WARNING: Data set WORK.TOT was not replaced because this step was stopped.
I think the warning (and the error before it) is telling you that the model did not converge. Without convergence, there is no model that the procedure can score to produce predicted values. There are several reasons why a model might not converge, but the most common is that it does not fit the data.
I assume the model you specify in the Y= statement is the model you want for the negative binomial mean (or maybe the log mean?), not the second parameter of the distribution. If so, then you probably want to have the P= statement like in the code you referred to earlier. Also, you don't want (or obviously, need) to predict the actual response, TOT. You presumably want the predicted response, mu. If that still causes fitting problems, then as I mentioned before you might have to try various starting values using the PARMS statement.
proc nlmixed;
mu = 5*365*((MINAADT)**beta_1)* ((MAXAADT)**beta_2)*(EXP(beta_0));
p = 1/(1+mu*k);
model y ~ negbin(1/k,p);
predict mu out=predmean;
run;
In the above case, TOT is my dependent variable so I was assuming I would have to specify that.
No - even in the case of an ordinary regression as would be done in PROC REG you are modeling the mean of Y, not Y, and would look like this in NLMIXED:
proc nlmixed;
mu = b0 + b1*x;
model y ~ normal(mu, s);
run;
But maybe you can avoid NLMIXED altogether. If the model you show is for the mean of your response, Y, then if you use the usual log link for the negative binomial model, your model becomes:
log(mu) = log(5) + log(365) + b1*log(minaadt) + b2*log(maxaadt) + b0
= newb0 + b1*log(minaadt) + b2*log(maxaadt)
where newb0=b0+log(5)+log(365). This can be fit in a generalized linear modeling procedure like GLIMMIX or GENMOD:
proc genmod;
model y = minaadt maxaadt / dist=negbin;
run;
which will provide estimates of newb0, b1, and b2.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.