I am trying to compare growth curves from an animal feeding trial and I am stuck. I have not been able to find an answer on my specific challenge.
I have four diets (A,,B, C, and D), 12 animals on each diet. The animals have all be weighed once a week for a 6 week period. I would like to compare the growth of the animals on each diet and determine whether the diets affect the growth. But I want to exploit that I have data from each week, and not just use start/end weight.
My plan was to estimate and compare the slopes of a fitted line, but I don´t know how to make the comparison with four groups and repeated measurements in the given period.
Can someone please help me with suggestions of the appropriate statistical procedure in SAS?
@LeneStod Will you please post an example data based on which the curve will be generated?
There are examples like this in the PROC MIXED documentation.
You can also look at the paper "Repeated Measures Modeling With PROC MIXED" by Moser, starting on p. 7.
I didn´t realise that the proc mixed could give me the answers. I will look into that, thanks.
Below is an example of data. I have 48 pigs on 4 different diets. The weight is measured every week. Included data from week 1 and 2, but I have data for week 1-6
I have fitted the following linear regression, and would like to compare the slopes.
6.222 + 3.030 * Week | A | 6.22222 | 3.02976 |
6.214 + 3.205 * Week | B | 6.21389 | 3.20476 |
5.917 + 3.024 * Week | C | 5.91667 | 3.02381 |
5.842 + 3.343 * Week | D | 5.84167 | 3.34286 |
Gris_ID | Diet | Weight | Week |
1 | B | 10 | 1 |
2 | B | 13 | 1 |
3 | B | 9,5 | 1 |
4 | C | 11,5 | 1 |
5 | C | 8,5 | 1 |
6 | C | 11,5 | 1 |
7 | A | 12,5 | 1 |
8 | A | 9 | 1 |
9 | A | 10,5 | 1 |
10 | D | 12 | 1 |
11 | D | 11,5 | 1 |
12 | D | 9,5 | 1 |
13 | A | 13,5 | 1 |
14 | A | 10,5 | 1 |
15 | A | 7,5 | 1 |
16 | B | 12,5 | 1 |
17 | B | 9,5 | 1 |
18 | B | 10,5 | 1 |
19 | D | 14,5 | 1 |
20 | D | 8,5 | 1 |
21 | D | 9,5 | 1 |
22 | B | 11 | 1 |
23 | B | 12 | 1 |
24 | B | 9 | 1 |
25 | C | 13,5 | 1 |
26 | C | 9,5 | 1 |
27 | C | 9 | 1 |
28 | A | 9,5 | 1 |
29 | A | 10,5 | 1 |
30 | A | 12 | 1 |
31 | C | 8,5 | 1 |
32 | C | 12,5 | 1 |
33 | C | 11,5 | 1 |
34 | D | 13 | 1 |
35 | D | 10 | 1 |
36 | D | 9 | 1 |
37 | C | 8 | 1 |
38 | C | 12 | 1 |
39 | C | 10,5 | 1 |
40 | B | 9,5 | 1 |
41 | B | 10,5 | 1 |
42 | B | 12,5 | 1 |
43 | D | 13,5 | 1 |
44 | D | 9 | 1 |
45 | D | 10 | 1 |
46 | A | 12 | 1 |
47 | A | 9,5 | 1 |
48 | A | 11 | 1 |
1 | B | 12,5 | 2 |
2 | B | 15 | 2 |
3 | B | 11,5 | 2 |
4 | C | 14 | 2 |
5 | C | 10 | 2 |
6 | C | 13 | 2 |
7 | A | 14 | 2 |
8 | A | 10,5 | 2 |
9 | A | 12,5 | 2 |
10 | D | 12 | 2 |
11 | D | 11,5 | 2 |
12 | D | 11,5 | 2 |
13 | A | 15 | 2 |
14 | A | 11 | 2 |
15 | A | 8,5 | 2 |
16 | B | 15,5 | 2 |
17 | B | 11,5 | 2 |
18 | B | 11 | 2 |
19 | D | 16,5 | 2 |
20 | D | 9 | 2 |
21 | D | 12 | 2 |
22 | B | 12,5 | 2 |
23 | B | 13,5 | 2 |
24 | B | 10,5 | 2 |
25 | C | 15,5 | 2 |
26 | C | 11 | 2 |
27 | C | 10,5 | 2 |
28 | A | 12,5 | 2 |
29 | A | 12,5 | 2 |
30 | A | 14 | 2 |
31 | C | 9 | 2 |
32 | C | 11,5 | 2 |
33 | C | 12,5 | 2 |
34 | D | 14,5 | 2 |
35 | D | 11,5 | 2 |
36 | D | 11,5 | 2 |
37 | C | 8 | 2 |
38 | C | 12,5 | 2 |
39 | C | 11 | 2 |
40 | B | 11 | 2 |
41 | B | 12,5 | 2 |
42 | B | 12,5 | 2 |
43 | D | 13,5 | 2 |
44 | D | 11 | 2 |
45 | D | 11,5 | 2 |
46 | A | 12 | 2 |
47 | A | 11,5 | 2 |
48 | A | 13 | 2 |
I just went ahead and made up a data of 20 weeks with four types of diet (averaged for 12 cows into one variable/column). I have attached the dataset here (MadeUpCowsDiet.csv). This type of relationship between Week and Diet column can be described as below.
Growth_Diet = (Theta_1 * Week)/(Theta_2 + Week) where Theta_1 and Theta_2 are the parameters that need to be estimated using regression.
The idea is to use PROC NLIN to do non-linear regression and get the Parameters (Theta_1 and Theta_2) for every Diet column and then use the predicted values to plot a regression line (using PROC SGPLOT). The code is as below.
PROC IMPORT DATAFILE="~/MadeUpCowsDiet.csv" DBMS=CSV OUT=MadeUpCowsDiet;
RUN;
%MACRO PredictAndPlot(Diet_Col_Name, Theta_1, Theta_2);
%LET PredictedName=Predicted_&Diet_Col_Name;
/* PROC NLIN will do non-linear regression while PROC SGPLOT will utilize the results to fit a curve */
PROC NLIN DATA=MadeUpCowsDiet;
PARAMETERS Theta_1=&Theta_1 Theta_2=&Theta_2;
MODEL &Diet_Col_Name=(Theta_1*Week)/(Theta_2+Week);
OUTPUT Predicted=&PredictedName OUT=Predicted;
RUN;
PROC SGPLOT DATA=Predicted;
SCATTER X=Week Y=&Diet_Col_Name;
SERIES X=Week Y=&PredictedName;
RUN;
%MEND;
/* Now you can call macro for every diet (column). Below is just one example. */
%PredictAndPlot(Gwth_Diet1,8,2);
When you run this, then you get the parameter estimates as below from PROC NLIN.
and you get the curve from PROC SGPLOT.
Please let me know if this is what you wanted. There is a big possibility that I have underestimated your problem.
@koyelghosh Thanks. Will your analysis perform a comparison between the four diets you have or just construct four different curves?
I would like to determine whether the curves for my four diets differ (with a p value)
@LeneStod It will construct four different curves but with little modification you can make all of them appear in one curve.
To test for the difference in means due to the diet, use
lsmeans diet / pdiff;
The LSMEANS statement is supported in both GLM and MIXED procedures. Here is a link to the LSMEANS documentation.
@Rick_SAS Thank you for the suggestions. I am a little unsure whether using Proc mixed will give me the answers I need. I use proc mixed for many of my studies, however (and correct me if I am wong) I normally use proc mixed to compare means. So if I do as you suggest, wouldn´t that just give me a comparision of the weight means overall at the time points used?
What I want is to compare the weekly increase, that is the slope of the growth curves. The important thing is not whether the absolute weight of animals on diet A is different from animals on diet B on so on, but I want to know whether the overall weekly increase differs.
I have attached an example of the curves with the slopes I want to compare (I will propably end up log transforming the values due to increasing variation, so the curves are just to give you an idea )
@LeneStod I might be wrong but you already have the fit curve and the coefficients and all you want to know is whether the mean of different diets across different weeks is significantly different.
If that is true, I would have been tempted to do ANOVA test (to see if any combination is significantly different) and then post-hoc test to see which pair is significantly different. I would get the answer to the question. LSMEANS might also get the job done as suggested by Rick_SAS.
If you want to know, rather, whether linear regression coefficients of the fit are signjficant then p-value should be embedded in the regression results.
I am sorry if I am still reading your requirement as wrong. Please let me know.
@koyelghosh.I guess it is a variation of our last suggestion I am interested in. Is the indvidual linear regression coefficient for each of the four curves significantly different. I get these results from the proc reg, but do these curves differ?
6.222 + 3.030 * Week | A | 6.22222 | 3.02976 |
6.214 + 3.205 * Week | B | 6.21389 | 3.20476 |
5.917 + 3.024 * Week | C | 5.91667 | 3.02381 |
5.842 + 3.343 * Week | D | 5.84167 | 3.34286 |
This is what I get from the curve for diet A, (I get similar information for the regression of diet B, ,C, and D) but no comparision between the four diets
1 | 1927.68601 | 1927.68601 | 189.68 | <.0001 |
70 | 711.39385 | 10.16277 | ||
71 | 2639.07986 |
3.18791 | 0.7304 |
16.82639 | 0.7266 |
18.94589 |
Intercept | 6.22222 | 0.85673 | 7.26 | <.0001 |
Week | 3.02976 | 0.21999 | 13.77 | <.0001 |
@LeneStod I see the problem now. I think the question that if one curve equation is significantly different from the other is like asking if two numbers differ significantly. For example it will be hard to formulate what it means if I am asking if number 2 is significantly different from number 3.
The significance (p value) comes into picture only if, at least, one side of the comparison operator has n number of members coming from a distribution.
Taking both points into account I find it hard to understand what it means by comparing the p values of two equations (and not members). However just because I can not understand does not mean that your stated problem can not be formulated and answer. Let's hope one of the SAS experts land on this page and shed much better light on this topic than me.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.