Hello everyone,
In some of my reserches I have to use "estimated frequencies" instead of raw counting data to fit a nlmixed model. Those frequencies are often fractions with decimals which seem to violate the model assumption (e.g. binomial distributed data). However, when I actually input these data to fit the model, SAS did finish the analysis without giving warning or error. I want to know if the PROC has actually rounded them to integers before doing the analysis or it just uses those fractional frequencies directly, and in either case if the result is affected by the data type.
Thank you.
When a logit link is fit, the values do not need to be integers, and the default link for the binomial distribution in NLMIXED is the logit. So even though the response is not an integer, all of the "heavy lifting" is done in a continuous space. The log likelihood involves the gamma function, which is defined for both integer and non-integer values.
I hope this helps some.
Steve Denham
I might be misinterpreting this, but are the estimated frequencies really a summary? Something like having 0.25, rather than 15 out of 60? NLMIXED can handle both ways of representing the data.
Steve Denham
Sorry for not expressing clearly. What I meant is that the frequencies are like 18.90 or 23.45, etc, estimated or corrected by some previous steps. The ideas is that to keep the decimals might give a more precise result in the final model fit than just round them to integers......
Thank you.
So what you are calling frequency is a count variable, but the concern is that the counts are non-integer. Am I understanding this correctly? I would guess that the values have been standardized--something like 18.90 cases per 100,000. Is that correct? If so, then you might (and this is only a might) consider restating them as proportions, which would fit the binomial distribution. Are you in a position where you can give an explicit definition of the response variable (I know that sometimes this falls into intellectual property problems)?
Steve Denham
My data are non-integer counts, but not proportions. There's no intellectual property concerns but I though it would be tedious to explain how this kind of data come from......Let me put it in a few words, say, I have three variables of integer counts, A,B,C to represent different groups in a sample, later I find out that the criteria for classifying the sample may not be satisfying and has to be modified. However, my way to correct the counts in different group based on my new classification criteria will sometimes give non-integer counts, A' B' C'. It's somewhat like to change the proportion of a multinomial distribution while keeping the total count fixed, that leads to non-integer frequencies.
Hope I've explained it well this time......
I think I get it. The problem that arises then is that the data are NOT binomially distributed, as the binomial only takes on values between 0 and 1. I would guess that they follow a poisson or negative binomial distribution, but the logical extension of these to continuous values is an exponential distribution.
Can you share your current NLMIXED code?
Thanks,
Steve Denham
Thank you for the reply.
These are the code, quite a simple model but when my data are non-integers, it still runs......
proc nlmixed data=GHC ecov cov;
parms ann=1 ta=1 uann=0 uta=0 ;
logitp = ((ta + uta)*2(lis)+ ann + uann);
p = exp(logitp)/(1+exp(logitp));
model pass ~ binomial(n,p);
random uta uann ~ normal([0,0],[uta,0,uann]) subject=id out=randeffs;
run;
Well, there is nothing in the code that requires anything to be an integer. It calculates a linear part (although it looks like something got lost in pasting, as the *2(lis) doesn't look executable). Then it calculates a logit, and fits it to a binomial distribution. That is all fine. Can you say anything about the parameters ta and ann? I assume that lis is the independent variable. Recall that the logistic curve is continuous, and that the integer inputs merely identify points on the curve, so non-integer inputs would identify points "in between".
Steve Denham
The independent variables are "lis", "n" and "pass", "lis" is for identifying to sets of frequencies to be used in the model, while "pass" are the frequencies which are supposed to follow the binomial distribution, as shown in
model pass ~ binomial(n,p);
then p depends on the logit of the parameters, which mean a some algebra of "ta" and "ann", sth like threshold and strength of a judgment/criteria, the *2 is a mistake , it should be .
logitp = ((ta + uta)*(lis/2) + ann + uann);
The question here is that data "pass" are non-integers, so I don't know how NLMIXED treats the variable "pass".
I remember a related issue. In some cases (but no exactly this case) we add a 0.5 to cells which have zero count, now that 0.5 isn't a integer, does this mean that SAS does not round the count/frequencies data and uses them directly to fit the model?
When a logit link is fit, the values do not need to be integers, and the default link for the binomial distribution in NLMIXED is the logit. So even though the response is not an integer, all of the "heavy lifting" is done in a continuous space. The log likelihood involves the gamma function, which is defined for both integer and non-integer values.
I hope this helps some.
Steve Denham
Thank you very much Steve, now I'll spend some time to understand your explanation (I'm not a statistician :smileysilly: ).
Now I've understood, thank you Steve。
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.