BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Unay13
Obsidian | Level 7

Hello,

 

I need to perform a Negative Binomial and Poisson distribution for a data that I have. Using GENMOD, COUNTREG and specifying distribution as NB or Poisson,  I got the mean as a linear function of x Variables. However, I have my own defined Non linear function such as Hoerl and Sigmoidal function that I need to incorporate in the distribution. 

 

For example, instead of this form.

 

The function I need to use is below  moreover the bottom 2. Where miu can is in terms of E.

 

Any help would be greatly appreciated. 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

The PARMS statement often isn't necessary if the default starting values for the parameters are reasonable enough to allow the fitting algorithm to converge to a proper solution. But when there are fitting problems, one often needs to try various other initial values, and the PARMS statement lets you do that. If you have an idea of approximately what the final parameter values should be, such as from the same model fit to previous data, it might be worthwhile to specify them as starting values in the PARMS statement.

 

1/k and p are the parameters of the negative binomial distribution. The code you show is in the section of the NLMIXED documentation which shows the form of the negative binomial log likelihood function and how those parameters appear in it. The model in that code (linp) is a linear model on the log of the negative binomial mean, mu. The p parameter is related to mu and the dispersion parameter, k, as shown. 

View solution in original post

12 REPLIES 12
Ksharp
Super User

Did you check PROC GENMOD Program Statement ?

 

proc genmod;
class car age;
a = _MEAN_;
y = _RESP_;
d = 2 * ( y * log( y / a ) - ( y - a ) );
variance var = a;
deviance dev = d;
model c = car age / link = log offset = ln;
run;
The variables var and dev are dummy variables used internally by the procedure to identify the variance and
deviance functions. Any valid SAS variable names can be used.
Similarly, the log link function and its inverse could be defined with the FWDLINK and INVLINK statements,
as follows:
fwdlink link = log(_MEAN_);
invlink ilink = exp(_XBETA_);
Unay13
Obsidian | Level 7
Thanks Ksharp. But I need to feed in the Hoerl and Sigmoidal function and do the NB modeling. By any chance do you have the codes for NB modeling with our own specified functions?
StatDave
SAS Super FREQ

Assuming that you mean you want to specify a certain nonlinear model to fit to a response that is distributed negative binomial, you will need to use PROC NLMIXED. There, you can specify both the log likelihood for the negative binomial and whatever linear or nonlinear model you want. See Note2 at the end of this note which shows the statements needed to define the negative binomial log likelihood in NLMIXED.

Unay13
Obsidian | Level 7
Your response was really helpful. However, I got the following codes :
proc nlmixed;
parms b0=3, b1=1, k=0.8;
linp = b0 + b1*x;
mu = exp(linp);
p = 1/(1+mu*k);
model y ~ negbin(1/k,p);
run;

How necessary is it to use the PARMS statement? I do not know what values or on what basis do I assign the values for parameters to be accepted? Also in the model statement, what does (1/k, p) stand for?
any help would be greatly appreciated.
StatDave
SAS Super FREQ

The PARMS statement often isn't necessary if the default starting values for the parameters are reasonable enough to allow the fitting algorithm to converge to a proper solution. But when there are fitting problems, one often needs to try various other initial values, and the PARMS statement lets you do that. If you have an idea of approximately what the final parameter values should be, such as from the same model fit to previous data, it might be worthwhile to specify them as starting values in the PARMS statement.

 

1/k and p are the parameters of the negative binomial distribution. The code you show is in the section of the NLMIXED documentation which shows the form of the negative binomial log likelihood function and how those parameters appear in it. The model in that code (linp) is a linear model on the log of the negative binomial mean, mu. The p parameter is related to mu and the dispersion parameter, k, as shown. 

Rick_SAS
SAS Super FREQ

Since you are new to PROC NLMIXED, here are two elementary examples.  Regarding ways to choose an initial guess for the parameters, see "The method of moments: A smart way to choose initial parameters for MLE"

Unay13
Obsidian | Level 7

for the following codes where I have not specified TOT as a data set as it is one of the variables, can you let me know why do I get the warning below:

 


892 proc NLMIXED data =SPFU3ST;
893 parms k=0.8;
894 Y= 5*365*((MINAADT)**beta_1)* ((MAXAADT)**beta_2)*(EXP(beta_0));
895 model TOT ~ NEGBIN (1/k, Y);
896 predict TOT out = TOT;
897 run;

NOTE: The parameters beta_1, beta_2, beta_0 are assigned the default starting value of 1.0, because
they are not assigned initial values with the PARMS statement.
ERROR: No valid parameter points were found.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TOT may be incomplete. When this step was stopped there were 0
observations and 0 variables.
WARNING: Data set WORK.TOT was not replaced because this step was stopped.

Rick_SAS
SAS Super FREQ

I think the warning (and the error before it) is telling you that the model did not converge. Without convergence, there is no model that the procedure can score to produce predicted values.  There are several reasons why a model might not converge, but the most common is that it does not fit the data.

StatDave
SAS Super FREQ

I assume the model you specify in the Y= statement is the model you want for the negative binomial mean (or maybe the log mean?), not the second parameter of the distribution. If so, then you probably want to have the P= statement like in the code you referred to earlier. Also, you don't want (or obviously, need) to predict the actual response, TOT. You presumably want the predicted response, mu. If that still causes fitting problems, then as I mentioned before you might have to try various starting values using the PARMS statement.

 

proc nlmixed;
mu =  5*365*((MINAADT)**beta_1)* ((MAXAADT)**beta_2)*(EXP(beta_0));
p = 1/(1+mu*k);
model y ~ negbin(1/k,p);

predict mu out=predmean;
run;

Unay13
Obsidian | Level 7

In the above case, TOT is my dependent variable so I was assuming I would have to specify that.

StatDave
SAS Super FREQ

No - even in the case of an ordinary regression as would be done in PROC REG you are modeling the mean of Y, not Y, and would look like this in NLMIXED:

 

proc nlmixed;

mu = b0 + b1*x;

model y ~ normal(mu, s);

run;

 

But maybe you can avoid NLMIXED altogether. If the model you show is for the mean of your response, Y, then if you use the usual log link for the negative binomial model, your model becomes:

 

log(mu) = log(5) + log(365) + b1*log(minaadt) + b2*log(maxaadt) + b0
        = newb0 + b1*log(minaadt) + b2*log(maxaadt)

 

where newb0=b0+log(5)+log(365). This can be fit in a generalized linear modeling procedure like GLIMMIX or GENMOD:

 

proc genmod;

model y = minaadt maxaadt / dist=negbin;

run;

 

which will provide estimates of newb0, b1, and b2.

Unay13
Obsidian | Level 7
I am sorry but I do not have a statistics background and I am new to SAS.
For me the NB modeling I want to do of the above function is for the dependent variable TOT that I have. So is the model statement correct:

model TOT ~ negbin(1/k,p);

Is K the overdispersion parameter?

The default Negbin function, is it for NB-1 or NB-2? NB-2 I suppose? What can I do to model my TOT (dependent variable) and MINAADT and MAXAADT as independent variables with the equation:
Y= 5*365*((MINAADT)**beta_1)* ((MAXAADT)**beta_2)*(EXP(beta_0));

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 3132 views
  • 6 likes
  • 4 in conversation