BookmarkSubscribeRSS Feed
Calcite | Level 5

## PROC GLM WITH BINOMIAL RESPONSE VARIABLES

Hello all,

I need help to determine what is wrong with my code (SAS 9.4). I have collected data from four different farms with 3 different treatments applied to the individual animals (each animal is the experimental unit). The response variable is binomial (0=no, 1=yes). I am trying to run PROC GLM to determine different averages. Running Tukey's for 1 degree comparisons. I am also trying to determine if there is a significant farm*treatment interaction. I am including my current working code with sample data below (I have over 400 observations so I thought to decrease the number of observations for this post).

DATA resync;
INPUT @4 farm \$ @3 treatment \$ @response;
datalines;

 Farm1 A 0 Farm1 A 1 Farm1 A 0 Farm1 B 0 Farm1 B 1 Farm1 C 1 Farm1 C 0 Farm1 C 1 Farm1 C 0 Farm2 A 1 Farm2 A 1 Farm2 A 0 Farm2 A 0 Farm2 A 0 Farm2 A 1 Farm2 B 1 Farm2 B 0 Farm3 B 1 Farm3 B 0 Farm3 B 1 Farm3 C 0 Farm3 C 0 Farm3 C 0 Farm3 C 1 Farm3 C 1 Farm4 A 1 Farm4 A 0 Farm4 B 1 Farm4 B 0 Farm4 C 1 Farm4 C 1

PROC GLM data=resync;
CLASS farm treatment farm*treatment;
MODEL farm treatment farm*treatment = response;
MEANS treatment / TUKEY;
PROC PRINT;
RUN;

Any help would be greatly appreciated!

4 REPLIES 4
Super User

## Re: PROC GLM WITH BINOMIAL RESPONSE VARIABLES

That's not how you specify a model..it would be at minimum this. See if you can get that to run - though I suspect it's still wrong. If the observed variable is binomial you want to specify that somehow. I'll move your question to the stats forum and hopefully someone with more statistical knowledge than me can answer it 🙂

``````PROC GLM data=resync;
CLASS farm treatment ;
MODEL response=  farm treatment farm*treatment ;
*MEANS treatment / TUKEY;
run;
quit;
``````
Diamond | Level 26

## Re: PROC GLM WITH BINOMIAL RESPONSE VARIABLES

With binary responses you would use PROC LOGISTIC, or possibly PROC GLIMMIX with the model option DIST=BINOMIAL.

--
Paige Miller
Rhodochrosite | Level 12

## Re: PROC GLM WITH BINOMIAL RESPONSE VARIABLES

To elaborate on the responses by @Reeza and @PaigeMiller:

Clearly, you need to use a procedure for data that are binary or binomial. GLM is definitely not the correct procedure, because it assumes the the response is normally distributed (conditional on the predictors).

In your data snippet, it does not look like each individual cow is independent of all other cows. Does each line in your data snippet represent one cow? If so, it seems that there are one or more cows receiving a particular treatment at each of four farms.

Cows on the same farm receiving the same treatment are subsamples, and the statistical model should incorporate cows accordingly. Assuming that FARM is a fixed effects factor, I see two options, one of which uses the LOGISTIC procedure, one of which uses the GLIMMIX procedure (you could also use GENMOD):

(1) Combine the data over multiple cows on the same farm and receiving the same treatment so that a new response is defined by the number of cows with outcome=1 (i.e., number of "successes") out of total number of cows. You could then use the LOGISTIC (or GENMOD) procedure with a binomial distribution using the "events/trials" response specification. See http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_syntax22.htm&docsetVersio...

(2) Use the data in the current format in the GLIMMIX procedure, specifying a mixed model with a RANDOM statement which clusters cows within sets of cows at a given farm receiving the same treatment.

Both approaches will produce the same results, but the first approach using LOGISTIC is more intuitive and that's what I would recommend.

I hope this helps. I think you will want to do some studying about logistic regression (or in this case, logit models because farm and treatment are categorical) and how to implement these models using SAS.

Diamond | Level 26

## Re: PROC GLM WITH BINOMIAL RESPONSE VARIABLES

@sld wrote:

To elaborate on the responses by @Reeza and @PaigeMiller:

Clearly, you need to use a procedure for data that are binary or binomial. GLM is definitely not the correct procedure, because it assumes the the response is normally distributed (conditional on the predictors).

GLM (and linear regression) assume the errors are normally distributed. But, you are correct that in this case, the errors are not normally distributed and thus do not meet the requirements of GLM.

--
Paige Miller
Discussion stats
• 4 replies
• 4183 views
• 9 likes
• 4 in conversation