Solved: Simple regression for animal feeding trial - help

sb16031 · Posted 01-25-2024 12:22 PM

Hi Folks,

I apologize in advance if this is a newbie question, I'm still relative unfamiliar with SAS and overall I don't have a super strong statistics background. I'm trying to analyze an animal feeding trial that someone else executed. Here is the study description:

There are a number of birds, housed in 14 blocks.
Each block contains a number of pens, for a total of 120 pens.
Each pen contains 6-8 birds (the number is variable, as some may die during the trial).
The variables that were measured at each time point were BW (body weight) and FI (cumulative feed intake); these are calculated pen-wise: for BW, all birds in the pen are weighed and the total weight is divided by the number of birds; the same is done for FI.
The trial length was 43 days, and measurement were taken at day 0,14,21,28,43
There were 8 treatments:
- T1 is a control (no disease, no supplements);
- T2 the birds were challenged with a disease;
- T3 the birds were challenged and also received a commercial antibiotic;
- T4 the birds were challenged and received no antibiotic, but instead a supplement at a rate of 0.125 g/ton;
- T5 is the same as TR4 but the rate was 0.250 g/ton;
- T6 as above, but rate was 0.5 g/ton;
- T7 same as above but the rate was 0.75 g/ton;
- T8 same as above but rate was 1 g/ton.
All of the treatments have the same number of PENs

I have set up the dataset to be the following way:

PEN	BLOCK	TRT	DAY	AB	CHALLENGE	SUPPL	VAR	VALUE
1	1	T4	0	NO	YES	0.125	BW	43
1	1	T4	14	NO	YES	0.125	BW	33
...	...	...	...	...	...	...	...	...
1	1	T4	14	NO	YES	0.125	FI	525

Where DAY is the number of days elapsed, AB is whether they're receiving antibiotic or not (only T3 received it), CHALLENGE is whether they were challenged or not (all except T1 were challenged), and SUPPL is the amount of supplement received (0 for T1-T3, increasingly higher for T4-T8).

The way I see this, it's a simple linear regression, where I'm trying to model the effect of the supplement overall and by day. I tried running a proc GLM as highlighted below:

ods graphics on;
PROC SORT DATA=CHICKS; BY VAR CHALLENGE SUPPL;
PROC GLM DATA=CHICKS PLOTS=DIAGNOSTICS; BY VAR;
CLASS DAY AB CHALLENGE SUPPL;
MODEL VALUE = DAY SUPPL DAY*SUPPL AB CHALLENGE;
LSMEANS DAY SUPPL DAY*SUPPL/PDIFF=all;
RUN;
ods graphics off;
QUIT;

However, I don't think this is the best approach, for several reasons:

I don't know how to model the RANDOM effect of the block (or if I should at all)
I'm not modeling the REPEATED measures (the same birds were measured over time from the same PEN).
The time points are unequally spaced, and I think PROC GLM doesn't like that.

Is there a better way to do this? Thank you in advance for all your feedback.

Mike_N · Posted 01-26-2024 09:52 AM

You have a lot going here, but I'll try to point you in the right direction. To answer your first two questions - you have repeated measurements on the same units (i.e., pens) over time, and you need to account for the (likely) case that those repeated measurements will be correlated. If you were to use the model you have presented (ordinary linear regression), you would be implicitly assuming that all of your observations are independent, with no correlation among observations. Because of your study design, you cannot make that assumption. There are lots of ways to specify a model that accounts for correlation among measurements, and you should start by reading the documentation for PROC MIXED and PROC GLIMMIX. To answer your third question - it should not be a problem that the measurements are unequally spaced.

View solution in original post

Mike_N · Posted 01-26-2024 09:52 AM

You have a lot going here, but I'll try to point you in the right direction. To answer your first two questions - you have repeated measurements on the same units (i.e., pens) over time, and you need to account for the (likely) case that those repeated measurements will be correlated. If you were to use the model you have presented (ordinary linear regression), you would be implicitly assuming that all of your observations are independent, with no correlation among observations. Because of your study design, you cannot make that assumption. There are lots of ways to specify a model that accounts for correlation among measurements, and you should start by reading the documentation for PROC MIXED and PROC GLIMMIX. To answer your third question - it should not be a problem that the measurements are unequally spaced.

sb16031 · Posted 02-13-2024 11:56 AM

Thank you very much for your help, and apologies for the delay in replying. I took some time to review the documentation for MIXED and GLIMMIX. I ended up analyzing two ways.

The first one with effect of AB, CHALLENGE and the SUPPL dose:

PROC SORT DATA=CHICKS; BY VAR;
PROC GLIMMIX DATA=CHICKS; BY VAR;
CLASS DAY PEN AB CHALLENGE SUPPL;
MODEL VALUE = DAY|SUPPL AB CHALLENGE;
RANDOM DAY / SUBJECT=PEN TYPE = unr residual;
RUN;

The second one with just the effect of TRT and the interaction with DAY

PROC SORT DATA=CHICKS; BY VAR;
PROC GLIMMIX DATA=CHICKS; BY VAR;
CLASS DAY PEN TRT;
MODEL VALUE = DAY|TRT;
RANDOM DAY / SUBJECT=PEN TYPE = unr residual;
SLICE DAY*TRT / sliceby=DAY lines ADJUST=tukey;
RUN;

Do you see anything glaringly wrong with this?

Thank you again.

Mike_N · Posted 02-13-2024 01:21 PM

This looks like you are on the right track. On quick review, I have two thoughts. First, consider whether you want to include a random intercept in the model. It's not necessarily required, just think about whether you expect random variation in the outcome variable at baseline, and whether you want to capture that variation in your model. Second, I can't tell exactly how your data is structured, but you might need to modify the subject= argument in the random statement. If it's the case that each pen has its own unique identifier, then what you have looks fine. But if the pen numbers are nested within block, i.e., there is a pen numbered 1 in block 1, block 2, etc., then consider subject = pen(block).

sb16031 · Posted 02-13-2024 03:33 PM

Thank you for your very prompt feedback! To answer your questions:

1) At the beginning of the trial, the birds are weighed and allocated to the treatments in a way that ensures no significant differences in BW at day 1. So at least the BW variable should be okay. There could be an inherent feed intake effect at baseline, but they're also genetically homogenous so barring any strange circumstance, they should be identical on that front too.

2) The pens are indeed unique, i.e. Block 1 has pens 1-8, Block 2 has pens 9-16, etc. I think the block variable is there for when studies have non-unique identifiers for pens, but that's not the case in this particular study.

If I were to include a random intercept, would that be:

RANDOM int DAY/ SUBJECT=PEN TYPE = unr residual;

or just

RANDOM int / SUBJECT=PEN TYPE = unr residual;

?

Mike_N · Posted 02-13-2024 03:58 PM

You can specify the random effects either way; a single RANDOM statement that contains both int and day, or with a separate RANDOM statement for each effect.

Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Re: Simple regression for animal feeding trial - help

Ready to join fellow brilliant minds for the SAS Hackathon?