I am doing multilevel analyses using beta regression because my dependent variable is a proportion. The range of proportion values includes 0 and 1; however, prior to analysis 0 was recoded to 0.005 and 1 was recoded to 0.995. Most of the analyses executed without error but four did not. The attached file shows the syntax for each analysis and the noted error (the same for three of the four analyses). As this my first experience with beta regression, please pitch your reply to the level of somebody with no experience with beta regression and an incomplete understanding of all glimmix options and keywords and their possible interactions. A low level slow walk would be appreciated.
Thanks Gene Maguin
Calling @Rick_SAS
I don't think we'll be able to reproduce these errors without having the data.
I am going to follow this closely as this is an interesting question that I don't have an answer for. Some additional information could be helpful, such as what are the numerator and denominator in the calculation of compliance2, how many records you have, and whether you have tried a pseudo-likelihood approach, rather than the quadrature.
SteveDenham.
Rick_SAS: I'm not the PI. May not be possible.
All: Ecological Momentary Assessment dataset. Longitudinal. 9 time points. Persons are (attempted to be) assessed by a single yes/no item once each day at wake-up ("Wake-up") and then at four random times during the following 12 hour period ("Random"). Each time point is a 7 day period. Each person has a Wake-up proportion and a Random proportion at each time point. Wake-up proportion is the count of Yeses at wake-up (0-7) for the week divided by 7 days. Random proportion is count of Yeses at the random times for the week (0-4*7) divided by 4*7=28. Number of persons at week 0 is ~255 and at week 9 is ~230. In the model statement "slope" is time point variable and assesstype is the Wake-up vs Random indicator.
Tried pseudo-likelihood? No. (Quadrature was recommended). I see that there are four PL options (RSPL | MSPL | RMPL | MMPL), which would you try first (why, please), then next?
Thanks, Gene Maguin
It doesn't have to be real data. If you can invent an example that shows the problem, we can try to understand what is causing the issue and recommend a solution.
If only the real data shows the issue, you can contact SAS Technical Support. They have procedures to handle proprietary or sensitive data from customers.
I agree with Rick -- we would need to have your data in order to see what might have caused the convergence issue.
In the meantime, try rescaling your time variable (slope). For example, divide that by 10, or 100, to see if that helps.
Thanks,
Jill
Well, it could be that using a beta distribution with the default log link is the problem. This phrase: Persons are (attempted to be) assessed by a single yes/no item. indicates to me that a binary/binomial distribution with a logit link may be more appropriate. In each case, the response variable is the proportion of Yeses out of the number of occasions for observing either Yes or No, rather than a proportion defined by a continuous variable divided by a larger, different continuous variable. With that in mind, take a look at Stroup and Claassen's paper:
The full paper may be behind a paywall (Springer) but keep digging and you'll find a pdf copy.
So the pseudo-likelihood method they used was the default for GLIMMIX - RSPL.
Good luck.
SteveDenham
Yes, that data will work. The binomial distribution is a default for models that use the events/trials syntax, but it can be applied to aggregate values as well, see example 51.4 Quasi-likelihood Estimation for Proportions with Unknown Distribution https://documentation.sas.com/doc/en/statug/15.2/statug_glimmix_examples07.htm and the complete code: https://documentation.sas.com/doc/en/pgmsascdc/v_032/statug/statug_code_gmxex04.htm .
One thing that you will want to do is put the extreme values of 0 and 1 back into the analysis dataset, rather than 0.005 and 0.995. The binomial distribution has support on the closed interval that includes zero and one when a logit link is used.
SteveDenham
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.