Hi,
I was wondering what is the correct regression procedure to use if I have a percentage as an continuous outcome (obviously bounded between 0-1). It is number of procedure with polyps over number of total procedures done. My understanding is that this can be used with either binomial like proc genmod with logit link with binomial distribution or proc logistic. However, I also read is that using proc glimmix with beta distribution is the one to use.
Which one is the correct one to use?
Thanks.
You need to use the Events/Trials syntax instead of the raw proportions. That is, instead of ADP, you need to specify the two variables ProceduresWithPolyps and NumberOfProcedures. For example, ADP=0.25 might correspond to ProceduresWithPolyps=2 and NumberOfProcedures=8.
Your model statement will look like this:
model ProceduresWithPolyps/NumberOfProcedures = /dist=binomial link=logit;
As stated at the start of this note:
When modeling response data consisting of proportions (or percentages), the observed values can be continuous or represent a summarized (or aggregated) binary response. For example, an observed proportion of 0.3 might represent 3 out of 10 subjects responding positively at a particular dose of a drug. At the subject level, the response is binary (positive or negative). If your data are aggregated binary data and you have the numerator and denominator counts making up the proportions, then you can fit a logistic model in procedures such as LOGISTIC, PROBIT, GENMOD, GAM, ADAPTIVEREG and others by using the events/trials syntax in the MODEL statement. These models assume the proportions represent a set of independent Bernoulli trials and have a binomial distribution.
Just a bit of expansion on what @StatDave wrote. A ratio of counts will not be beta distributed, as the beta distribution is a ratio of continuous variables, bounded on (0,1), with the endpoints excluded. A binomial distribution would be the most logical for the example you propose (procedures with polyps/total procedures)
SteveDenham
You need to use the Events/Trials syntax instead of the raw proportions. That is, instead of ADP, you need to specify the two variables ProceduresWithPolyps and NumberOfProcedures. For example, ADP=0.25 might correspond to ProceduresWithPolyps=2 and NumberOfProcedures=8.
Your model statement will look like this:
model ProceduresWithPolyps/NumberOfProcedures = /dist=binomial link=logit;
PROC LOGISTIC is the best tool for logistic regression and simpler syntax:
proc logistic;
model polyps/total = <your predictor variables>;
run;
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.