Hi Folks,
I apologize in advance if this is a newbie question, I'm still relative unfamiliar with SAS and overall I don't have a super strong statistics background. I'm trying to analyze an animal feeding trial that someone else executed. Here is the study description:
I have set up the dataset to be the following way:
PEN | BLOCK | TRT | DAY | AB | CHALLENGE | SUPPL | VAR | VALUE |
1 | 1 | T4 | 0 | NO | YES | 0.125 | BW | 43 |
1 | 1 | T4 | 14 | NO | YES | 0.125 | BW | 33 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1 | 1 | T4 | 14 | NO | YES | 0.125 | FI | 525 |
Where DAY is the number of days elapsed, AB is whether they're receiving antibiotic or not (only T3 received it), CHALLENGE is whether they were challenged or not (all except T1 were challenged), and SUPPL is the amount of supplement received (0 for T1-T3, increasingly higher for T4-T8).
The way I see this, it's a simple linear regression, where I'm trying to model the effect of the supplement overall and by day. I tried running a proc GLM as highlighted below:
ods graphics on; PROC SORT DATA=CHICKS; BY VAR CHALLENGE SUPPL; PROC GLM DATA=CHICKS PLOTS=DIAGNOSTICS; BY VAR; CLASS DAY AB CHALLENGE SUPPL; MODEL VALUE = DAY SUPPL DAY*SUPPL AB CHALLENGE; LSMEANS DAY SUPPL DAY*SUPPL/PDIFF=all; RUN; ods graphics off; QUIT;
However, I don't think this is the best approach, for several reasons:
Is there a better way to do this? Thank you in advance for all your feedback.
You have a lot going here, but I'll try to point you in the right direction. To answer your first two questions - you have repeated measurements on the same units (i.e., pens) over time, and you need to account for the (likely) case that those repeated measurements will be correlated. If you were to use the model you have presented (ordinary linear regression), you would be implicitly assuming that all of your observations are independent, with no correlation among observations. Because of your study design, you cannot make that assumption. There are lots of ways to specify a model that accounts for correlation among measurements, and you should start by reading the documentation for PROC MIXED and PROC GLIMMIX. To answer your third question - it should not be a problem that the measurements are unequally spaced.
You have a lot going here, but I'll try to point you in the right direction. To answer your first two questions - you have repeated measurements on the same units (i.e., pens) over time, and you need to account for the (likely) case that those repeated measurements will be correlated. If you were to use the model you have presented (ordinary linear regression), you would be implicitly assuming that all of your observations are independent, with no correlation among observations. Because of your study design, you cannot make that assumption. There are lots of ways to specify a model that accounts for correlation among measurements, and you should start by reading the documentation for PROC MIXED and PROC GLIMMIX. To answer your third question - it should not be a problem that the measurements are unequally spaced.
Thank you very much for your help, and apologies for the delay in replying. I took some time to review the documentation for MIXED and GLIMMIX. I ended up analyzing two ways.
The first one with effect of AB, CHALLENGE and the SUPPL dose:
PROC SORT DATA=CHICKS; BY VAR;
PROC GLIMMIX DATA=CHICKS; BY VAR;
CLASS DAY PEN AB CHALLENGE SUPPL;
MODEL VALUE = DAY|SUPPL AB CHALLENGE;
RANDOM DAY / SUBJECT=PEN TYPE = unr residual;
RUN;
The second one with just the effect of TRT and the interaction with DAY
PROC SORT DATA=CHICKS; BY VAR;
PROC GLIMMIX DATA=CHICKS; BY VAR;
CLASS DAY PEN TRT;
MODEL VALUE = DAY|TRT;
RANDOM DAY / SUBJECT=PEN TYPE = unr residual;
SLICE DAY*TRT / sliceby=DAY lines ADJUST=tukey;
RUN;
Do you see anything glaringly wrong with this?
Thank you again.
This looks like you are on the right track. On quick review, I have two thoughts. First, consider whether you want to include a random intercept in the model. It's not necessarily required, just think about whether you expect random variation in the outcome variable at baseline, and whether you want to capture that variation in your model. Second, I can't tell exactly how your data is structured, but you might need to modify the subject= argument in the random statement. If it's the case that each pen has its own unique identifier, then what you have looks fine. But if the pen numbers are nested within block, i.e., there is a pen numbered 1 in block 1, block 2, etc., then consider subject = pen(block).
Thank you for your very prompt feedback! To answer your questions:
1) At the beginning of the trial, the birds are weighed and allocated to the treatments in a way that ensures no significant differences in BW at day 1. So at least the BW variable should be okay. There could be an inherent feed intake effect at baseline, but they're also genetically homogenous so barring any strange circumstance, they should be identical on that front too.
2) The pens are indeed unique, i.e. Block 1 has pens 1-8, Block 2 has pens 9-16, etc. I think the block variable is there for when studies have non-unique identifiers for pens, but that's not the case in this particular study.
If I were to include a random intercept, would that be:
RANDOM int DAY/ SUBJECT=PEN TYPE = unr residual;
or just
RANDOM int / SUBJECT=PEN TYPE = unr residual;
?
You can specify the random effects either way; a single RANDOM statement that contains both int and day, or with a separate RANDOM statement for each effect.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.