Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
snca229
Calcite | Level 5

Hello all, 


I need help to determine what is wrong with my code (SAS 9.4). I have collected data from four different farms with 3 different treatments applied to the individual animals (each animal is the experimental unit). The response variable is binomial (0=no, 1=yes). I am trying to run PROC GLM to determine different averages. Running Tukey's for 1 degree comparisons. I am also trying to determine if there is a significant farm*treatment interaction. I am including my current working code with sample data below (I have over 400 observations so I thought to decrease the number of observations for this post). 

 

 

DATA resync;
INPUT @4 farm $ @3 treatment $ @response;
datalines;

Farm1A0
Farm1A1
Farm1A0
Farm1B0
Farm1B1
Farm1C1
Farm1C0
Farm1C1
Farm1C0
Farm2A1
Farm2A1
Farm2A0
Farm2A0
Farm2A0
Farm2A1
Farm2B1
Farm2B0
Farm3B1
Farm3B0
Farm3B1
Farm3C0
Farm3C0
Farm3C0
Farm3C1
Farm3C1
Farm4A1
Farm4A0
Farm4B1
Farm4B0
Farm4C1
Farm4C1

PROC GLM data=resync;
CLASS farm treatment farm*treatment; 
MODEL farm treatment farm*treatment = response;
MEANS treatment / TUKEY;
PROC PRINT; 
RUN;

 

 

Any help would be greatly appreciated!

4 REPLIES 4
Reeza
Super User

That's not how you specify a model..it would be at minimum this. See if you can get that to run - though I suspect it's still wrong. If the observed variable is binomial you want to specify that somehow. I'll move your question to the stats forum and hopefully someone with more statistical knowledge than me can answer it 🙂

 

PROC GLM data=resync;
CLASS farm treatment ; 
MODEL response=  farm treatment farm*treatment ;
*MEANS treatment / TUKEY;
run;
quit;
PaigeMiller
Diamond | Level 26

With binary responses you would use PROC LOGISTIC, or possibly PROC GLIMMIX with the model option DIST=BINOMIAL. 

--
Paige Miller
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

 

To elaborate on the responses by @Reeza and @PaigeMiller:

 

Clearly, you need to use a procedure for data that are binary or binomial. GLM is definitely not the correct procedure, because it assumes the the response is normally distributed (conditional on the predictors).

 

In your data snippet, it does not look like each individual cow is independent of all other cows. Does each line in your data snippet represent one cow? If so, it seems that there are one or more cows receiving a particular treatment at each of four farms.

 

Cows on the same farm receiving the same treatment are subsamples, and the statistical model should incorporate cows accordingly. Assuming that FARM is a fixed effects factor, I see two options, one of which uses the LOGISTIC procedure, one of which uses the GLIMMIX procedure (you could also use GENMOD):

 

(1) Combine the data over multiple cows on the same farm and receiving the same treatment so that a new response is defined by the number of cows with outcome=1 (i.e., number of "successes") out of total number of cows. You could then use the LOGISTIC (or GENMOD) procedure with a binomial distribution using the "events/trials" response specification. See http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_logistic_syntax22.htm&docsetVersio...

 

(2) Use the data in the current format in the GLIMMIX procedure, specifying a mixed model with a RANDOM statement which clusters cows within sets of cows at a given farm receiving the same treatment.

 

Both approaches will produce the same results, but the first approach using LOGISTIC is more intuitive and that's what I would recommend.

 

I hope this helps. I think you will want to do some studying about logistic regression (or in this case, logit models because farm and treatment are categorical) and how to implement these models using SAS.

 

PaigeMiller
Diamond | Level 26

@sld wrote:

 

To elaborate on the responses by @Reeza and @PaigeMiller:

 

Clearly, you need to use a procedure for data that are binary or binomial. GLM is definitely not the correct procedure, because it assumes the the response is normally distributed (conditional on the predictors).

 


GLM (and linear regression) assume the errors are normally distributed. But, you are correct that in this case, the errors are not normally distributed and thus do not meet the requirements of GLM.

--
Paige Miller

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 5204 views
  • 9 likes
  • 4 in conversation