02-03-2018 07:45 AM
Dear users and braintrust,
I am submitting you suggestions on a question we are trying to solve.
We are working with calves within herds (farm). We initially planned to include 12 up 20 calves per herd where for each calves we record dichotomous data.
dataset example is:
farm /calf/FPT (0,1)/X1(0,1)/…/Xn(0,1)
ex: 1/id of the calf/0/1/…/0
We initially plan to report success of FPT per herd (number of calves with FPT=1 divided by the total number of calves tested for this herd) using the GLIMMIX procedure with event/trial syntax (see below). Since we want to make all inferences at the herd level we planned to compile for each herd the proportion of herd presenting X1, …,Xn (pX1, …, pXn) with pXi=number of calves where Xi=1/total number of calves tested in that farm
proc glimmix data=farm noitprint noclprint ic=q chol or method=quad (qpoints=7)
model success/Total = pX1 / cl solution link=logit dist=binomial chisq oddsratio;
random intercept / subject=farm ;
Now, as usual in these types of studies, we had some small herds where we only have 6 available calves and farms where we have 20 calves. since we all agree that pX1,..,pXn precision will vary depending on the calves numbers for every farm I want to know if there is any procedure that I can use for accounting uncertainty (large SE) based on the total number of cases available for pXi evaluation, or I simply need to drop these farms (20% of my dataset).
Example: in farm 1: 15 calves with 5 having X1=1 so pX1=1/3 on 15 cases
In farm 2, 6 calves with 2 having X1 so pX1=1/3 on only 6 cases.
So can I account for my proportion uncertainty in my final model (avoiding dropping data from farm 2 which I still consider valuable)?
Thanks for your input as usual
02-03-2018 03:16 PM
I don't understand what X1, ... Xn are.
Is the denominator for pXi ("total number of calves tested in that farm") the same value as the denominator in the response ("Total")?
Going to need more detail to be of any potential help....
02-03-2018 04:13 PM
Sorry for the misunderstanding,
my database is at the calf level but I want to put all my data to the farm level (ie calves clustered in a farm)
for a specific calf in a specific farm i, X1 is normal delay before first feeding (noted 0 or 1) and so on for the X2 (adequate meal quality 0/1)...Xn
then because every calf is in a farm pX1 is the number of calves for a specific farm with adequate delay divided by the total of calves included in that ith farm
pX1=number of calves with X1=1 in farm i / (number of calves in this farm)
I want to know what is impacting the proportion of success in a farm given its proportion of calves with adequate delay before feeding, the proportion of calves receiving an adequate meal and so on
I therefore want to know how accounting that in some farms I may have denminator changing which will impact my precision on the proportion estimate
I hope this clarify the post
02-03-2018 04:25 PM - edited 02-03-2018 05:29 PM
How big is "n", in other words, how many Xi are there? Do they all represent sequential feedings (e.g., first feeding, second feeding, etc.) or are they completely different variables (e.g., "normal delay before first feeding" and "adequate meal quality")? Why would calves on the same farm be handled differently (for example, some calves have X1=0 and others have X1=1)? I am not an animal scientist, and I have little knowledge about this sort of study, so this is not making much sense to me yet.
Edit: Plus, if you have X1, ... Xn observed on each calf, why drop information by collapsing to the farm-level? Why not use something like
class farm X1 X2 X3;
model FPT = X1 X2 X3 / dist=binary link=logit; random intercept / subject=farm ;
02-04-2018 07:14 AM
Thanks for the reply,
actually yes there can be some change in the same farm, for example in a farm some calves would be treated differently because the farmers are busy guys that needs to do multiple things at the same time. for these reasons the delay between birth and first feeding will vary. The meal quality will also vary because coming from the calves' dam (so quality can change from an individual to another in the same farm because of individual factors and also management of the dam's feed).
There are 4 possible Xi which considered. they are not dependent (delay before feeding, quality of the meal, bacterial contamination of the meal and volume fed).
We thought to this specific framework because our recommendation in animal production are always done at the farm level. Basically the success of the calf is not our primary objective but we focus on the success of the herd.
for this we think better including proportion of animal presenting Xi as a better intuitive tool for the farmers than focusing on the calf level model.
At the end we want to have practical recommendations saying that if your proportion of calves presenting Xi=1 is A increasing to B would increase your FPT success by C%.
I hope this is more clear now.
02-04-2018 01:47 PM
I see your position. This is how I would envision that analysis approach: Your dataset would have one observation for each farm, and so there would be no need for a RANDOM statement. Each observation would have values for Success (number of calves with FPT=1), Total (number of calves on which FPT was observed), pX1, pX2, pX3, pX4. Although different farms have different denominators for the pX, there is no uncertainty in these measurements (as long as we assume the farmer counts correctly) so measurement error models would not seem to apply here. Overdispersion is a possibility. Plus with only 41 calves, with 6 to 20 calves per farm, there must be very few farms (4? 5?), and you would be able to assess only one pXi at a time.
From another position, we could see herd success as being cumulative calf success. A model using calf-level data--where FPT is matched with X1, X2, X3, and X4 by calf--seems preferable. This approach would (1) avoid the concern about different farm sizes; any sources of variance among farms would be captured in the random effects for farm; (2) make overfitting (too many parameter estimates relative to the number of observations) somewhat less of a concern, or at least more manageable; (3) avoid the potential of overdispersion; and (4) better reflect the sampling design, where all variables are observed at the calf-level.
So, that's my opinion. From the point of view of continuing statistical education (a lifetime process), you could fit both models and compare what was possible and what story each tells. I'm certainly not suggesting that you then use the one that produces results that you prefer! But there is value (at least for me) in doing in addition to thinking.
02-09-2018 08:03 AM
Thanks for this reply. In fact we have 55 farms where we have at least 12 or more calves (up to 20). we have an extra 20 farms where we have from 5 to 10 calves.
since our inference would be at the farm level it is why we want if possible to stay at the farm level for all our analysis with the idea that what is good at the farm level is not necessarly exactly the same at the calf level.