BookmarkSubscribeRSS Feed
emaguin
Quartz | Level 8

I am doing multilevel analyses using beta regression because my dependent variable is a proportion. The range of proportion values includes 0 and 1; however, prior to analysis 0 was recoded to 0.005 and 1 was recoded to 0.995. Most of the analyses executed without error but four did not. The attached file shows the syntax for each analysis and the noted error (the same for three of the four analyses). As this my first experience with beta regression, please pitch your reply to the level of somebody with no experience with beta regression and an incomplete understanding of all glimmix options and keywords and their possible interactions. A low level slow walk would be appreciated.

Thanks Gene Maguin

11 REPLIES 11
Rick_SAS
SAS Super FREQ

I don't think we'll be able to reproduce these errors without having the data.

SteveDenham
Jade | Level 19

I am going to follow this closely as this is an interesting question that I don't have an answer for.  Some additional information could be helpful, such as what are the numerator and denominator in the calculation of compliance2, how many records you have, and whether you have tried a pseudo-likelihood approach, rather than the quadrature.

 

SteveDenham.

emaguin
Quartz | Level 8

Rick_SAS: I'm not the PI. May not be possible.

All: Ecological Momentary Assessment dataset. Longitudinal. 9 time points. Persons are  (attempted to be) assessed by a single yes/no item once each day at wake-up ("Wake-up") and then at four random times during the following 12 hour period ("Random"). Each time point is a 7 day period. Each person has a Wake-up proportion and a Random proportion at each time point. Wake-up proportion is the count of Yeses at wake-up (0-7) for the week divided by 7 days. Random proportion is count of Yeses at the random times for the week (0-4*7) divided by 4*7=28. Number of persons at week 0 is ~255 and at week 9 is ~230. In the model statement "slope" is time point variable and assesstype is the Wake-up vs Random indicator.

Tried pseudo-likelihood? No. (Quadrature was recommended). I see that there are four PL options (RSPL | MSPL | RMPL | MMPL), which would you try first (why, please), then next?

Thanks, Gene Maguin

Rick_SAS
SAS Super FREQ

It doesn't have to be real data. If you can invent an example that shows the problem, we can try to understand what is causing the issue and recommend a solution.

 

If only the real data shows the issue, you can contact SAS Technical Support. They have procedures to handle proprietary or sensitive data from customers.

emaguin
Quartz | Level 8
I don't believe i have technical skill to create a plausible beta distribution test data model.
I want to ask about the meaning of the three error messages that I'm seeing. These are
ERROR: QUANEW Optimization cannot be completed.
ERROR: The function value of the objective function cannot be computed at the starting point.

ERROR: Infeasible parameter values for evaluation of objective function with 1 quadrature point.

What do these mean in terms of where the estimation process is failing. To me they are pretty opaque and I'd like to understand what is happening.
jiltao
SAS Super FREQ

I agree with Rick -- we would need to have your data in order to see what might have caused the convergence issue.

In the meantime, try rescaling your time variable (slope). For example, divide that by 10, or 100, to see if that helps.

Thanks,

Jill

SteveDenham
Jade | Level 19

Well, it could be that using a beta distribution with the default log link is the problem.  This phrase: Persons are  (attempted to be) assessed by a single yes/no item. indicates to me that a binary/binomial distribution with a logit link may be more appropriate. In each case, the response variable is the proportion of Yeses out of the number of occasions for observing either Yes or No, rather than a proportion defined by a continuous variable divided by a larger, different continuous variable.  With that in mind, take a look at Stroup and Claassen's paper:

 https://econpapers.repec.org/article/sprjagbes/v_3a25_3ay_3a2020_3ai_3a4_3ad_3a10.1007_5fs13253-020-... .

The full paper may be behind a paywall (Springer) but keep digging and you'll find a pdf copy.

 

So the pseudo-likelihood method they used was the default for GLIMMIX - RSPL.

 

Good luck.

 

SteveDenham

emaguin
Quartz | Level 8
I can a copy from the library. Thanks for the recommendation. I'll try RSPL. I understand your alternative; I read about it in an ecology statistics book. Help me learn something. How would this be implemented? Our data is the computed proportion (because the dataset was originally analyzed assuming a normal distribution). I assume that that data won't work for the alternative method. Would I have to go back to the original data where each record represents an instance where a call was made? I know that the spss procedure genlinmixed allows the data to be expressed as the number of successes of the number of attempts, where the number of attempts can be a variable in the dataset. I just assume sas can do that, why not, but does glimmix have that capability? If so, where/how would it be documented? Lastly, the other problem is proportions of 0.0 or 1.0. In these current analyses, those values are recoded to 0.005 and 0.995, respectively. What happens in a logit formulation? Discarded as undefined? Given an arbitrarily large or small value, which is what mplus apparently does? I agree that your proposed model is a better representation of the data. The question is implementation.
SteveDenham
Jade | Level 19

Yes, that data will work. The binomial distribution is a default for models that use the events/trials syntax, but it can be applied to aggregate values as well, see  example 51.4 Quasi-likelihood Estimation for Proportions with Unknown Distribution    https://documentation.sas.com/doc/en/statug/15.2/statug_glimmix_examples07.htm and the complete code: https://documentation.sas.com/doc/en/pgmsascdc/v_032/statug/statug_code_gmxex04.htm .

One thing that you will want to do is put the extreme values of 0 and 1 back into the analysis dataset, rather than 0.005 and 0.995. The binomial distribution has support on the closed interval that includes zero and one when a logit link is used.

 

SteveDenham

emaguin
Quartz | Level 8
Steve, Thank you. I would never have found that. I'll try it out but it certainly looks like it is what I need.
One thing I noticed that that clicking on the complete code link is a dead end. Searching on "sas glimmix" and clicking on first result (https://support.sas.com/rnd/app/stat/procedures/glimmix.html) gets to a page with "SAS/STAT Software" as the heading. Under the examples is the example you pointed me to but there it is listed as Example 49.4 (and the link to the complete code works). So, who knows why; it like going into your familiar grocery store and nearly everything has been rearranged.
Again, thank you.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2932 views
  • 0 likes
  • 5 in conversation