BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barbapapa
Calcite | Level 5

Hello everyone,

In some of my reserches I have to use "estimated frequencies" instead of raw counting data to fit a nlmixed model. Those frequencies are often fractions with decimals which seem to violate the model assumption (e.g. binomial distributed data). However, when I actually input these data to fit the model, SAS did finish the analysis without giving warning or error. I want to know if the PROC has actually rounded them to integers before doing the analysis or it just uses those fractional frequencies directly, and in either case if the result is affected by the data type.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

When a logit link is fit, the values do not need to be integers, and the default link for the binomial distribution in NLMIXED is the logit.  So even though the response is not an integer, all of the "heavy lifting" is done in a continuous space.  The log likelihood involves the gamma function, which is defined for both integer and non-integer values.

I hope this helps some.

Steve Denham

View solution in original post

12 REPLIES 12
SteveDenham
Jade | Level 19

I might be misinterpreting this, but are the estimated frequencies really a summary?  Something like having 0.25, rather than 15 out of 60?  NLMIXED can handle both ways of representing the data.

Steve Denham

Barbapapa
Calcite | Level 5

Sorry for not expressing clearly. What I meant is that the frequencies are like 18.90 or 23.45, etc, estimated or corrected by some previous steps. The ideas is that to keep the decimals might give a more precise result in the final model fit than just round them to integers......

Thank you. 

SteveDenham
Jade | Level 19

So what you are calling frequency is a count variable, but the concern is that the counts are non-integer.  Am I understanding this correctly?  I would guess that the values have been standardized--something like 18.90 cases per 100,000.  Is that correct?  If so, then you might (and this is only a might) consider restating them as proportions, which would fit the binomial distribution.  Are you in a position where you can give an explicit definition of the response variable (I know that sometimes this falls into intellectual property problems)?

Steve Denham

Barbapapa
Calcite | Level 5

My data are non-integer counts, but not proportions. There's no intellectual property concerns Smiley Happy but I though it would be tedious to explain how this kind of data come from......Let me put it in a few words, say, I have three variables of integer counts, A,B,C to represent different groups in a sample, later I find out that the criteria for classifying the sample may not be satisfying and has to be modified. However, my way to correct the counts in different group based on my new classification criteria will sometimes give non-integer counts, A'  B'  C'. It's somewhat like to change the proportion of a multinomial distribution while keeping the total count fixed, that leads to non-integer frequencies.

Hope I've explained it well this time......

SteveDenham
Jade | Level 19

I think I get it.  The problem that arises then is that the data are NOT binomially distributed, as the binomial only takes on values between 0 and 1.  I would guess that they follow a poisson or negative binomial distribution, but the logical extension of these to continuous values is an exponential distribution.

Can you share your current NLMIXED code?

Thanks,

Steve Denham

Barbapapa
Calcite | Level 5

Thank you for the reply.

These are the code, quite a simple model but when my data are non-integers, it still runs......

proc nlmixed data=GHC ecov cov;

      parms ann=1 ta=1 uann=0 uta=0 ;

       logitp = ((ta + uta)*2(lis)+ ann + uann);

       p = exp(logitp)/(1+exp(logitp));

       model pass ~ binomial(n,p);

     

       random uta uann ~ normal([0,0],[uta,0,uann]) subject=id out=randeffs;

run;

SteveDenham
Jade | Level 19

Well, there is nothing in the code that requires anything to be an integer.  It calculates a linear part (although it looks like something got lost in pasting, as the *2(lis) doesn't look executable).  Then it calculates a logit, and fits it to a binomial distribution.  That is all fine.  Can you say anything about the parameters ta and ann?  I assume that lis is the independent variable.  Recall that the logistic curve is continuous, and that the integer inputs merely identify points on the curve, so non-integer inputs would identify points "in between".

Steve Denham

Barbapapa
Calcite | Level 5

The independent variables are "lis", "n" and "pass", "lis" is for identifying to sets of frequencies to be used in the model, while "pass" are the frequencies which are supposed to follow the binomial distribution, as shown in

model pass ~ binomial(n,p);

then p depends on the logit of the parameters, which mean a some algebra of "ta" and "ann", sth like threshold and strength of a judgment/criteria, the *2 is a mistake Smiley Sad, it should be .

logitp = ((ta + uta)*(lis/2) + ann + uann);

The question here is that data "pass" are non-integers, so I don't know how NLMIXED treats the variable "pass".

Barbapapa
Calcite | Level 5

I remember a related issue. In some cases (but no exactly this case) we add a 0.5 to cells which have zero count, now that 0.5 isn't a integer, does this mean that SAS does not round the count/frequencies data and uses them directly to fit the model?

SteveDenham
Jade | Level 19

When a logit link is fit, the values do not need to be integers, and the default link for the binomial distribution in NLMIXED is the logit.  So even though the response is not an integer, all of the "heavy lifting" is done in a continuous space.  The log likelihood involves the gamma function, which is defined for both integer and non-integer values.

I hope this helps some.

Steve Denham

Barbapapa
Calcite | Level 5

Thank you very much Steve, now I'll spend some time to understand your explanation (I'm not a statistician :smileysilly: ).

Barbapapa
Calcite | Level 5

Now I've understood, thank you Steve。

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 1779 views
  • 0 likes
  • 2 in conversation