BookmarkSubscribeRSS Feed
keckk
Fluorite | Level 6
Hello statistic freaks!

I am not quite sure about which SAS procedure might be the appropriate one for my count data: proc genmod or proc glimmix ?

I have got counts as outcomes, but different number of times that I have sampled my subjects (cows on different farms with farms defined as random effects) within a give time frame (for feasiblity reasons). How can I account for the different number of times: using the WEIGHT statement of using number of times as OFFSET variable in the model statement ? What's the difference ?

glimmix or genmod ? I read that proc genmod was designed for fixed effects only ?

What about ordinal data (same levels: random farms, animals on farms) and repeated measurements over time ? definitly proc glimmix (generalized linear mixed model) ?

I greatly appreciate any advice !
Ka
11 REPLIES 11
keckk
Fluorite | Level 6
Hi !
I have given it a try with glimmix using the option. unfortunately, there's no conversion, only if omit .
Obviously, this is not the correct way when I have different numbers of times that I have observed/sampled my subjects, is it ?

May anyone give me some advice how to correctly fit these count data ?

Thank you very much in advance,
Ka
plf515
Lapis Lazuli | Level 10
Hi
You want GLIMMIX, but you don't want any OFFSET or WEIGHT statements, as far as I can see.

I would start by looking at the options for DIST:
http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/statug_glimmix_a0000001411.htm...

and explore POISSON, or, more likely NEGBIN.

You may also want to look at the documentation for PROC MIXED, which has an example of how to model repeated measures

HTH

Peter
keckk
Fluorite | Level 6
Hi Peter,
thanks very much for your hints.
there's one mistake: I think I NEED an option which allows me to account for the different number of times I sampled my subjects (response variable = counts (counting a specific behaviour), which I summed up over the different number of times I counted the subjects), and I thought the offset-option would be the solution for it. I used the negbin distribution.
Unfortunately, the model doesn't converge, if I do not omit the offset option for the different number of times.

I think I HAVE TO account for the different number of times I counted the behaviour of the subjects, and wonder if there's another possibility using the WEIGHT option ? but I can't find an example explaining the use of this option...:-(
There are just replicates (number of times counting the behaviours of the subjects), no repeated measures in my data.

Any experience how to account for the unbalanced situation (different number of times counting the subjects) other than by using the OFFSET-option ?

I would greatly appreciate your advice.
Ka
plf515
Lapis Lazuli | Level 10
Hi
I am confused. Do you have ONE measure per subject or MULTIPLE measures per subject? At first I thought you had multiple measures per cow, but looking again, it seems like you are summing them up and need random effects to deal with the farms, not the temporal aspect.

If that's the case, then you can deal with the number of times by creating an average, rather than a total.

But it would help a lot to know what you are trying to do in a little more detail

Peter
keckk
Fluorite | Level 6
yes, I have counted the behaviours of the subjects repeatedly throughout the day: so MULTIPLE measurements per subject (no interest in temporal aspect), and created totals within certain hours of the day (morning, afternoon...) and use separate models for each time of the day, as I don't want to create more levels than I already have (farm--->subject(farm)---->(3)periods (of three days)).
Because data are counts, I was advised to create totals so that there are still integers. Initially I also had means; actually I don't know what is more appropriate. Using means, I still have different number of times contributing to the means. In the end, I think I have to account for the number of times differing between the subjects anyway and I thought to do this using either the WEIGHT-option (?) or the OFFSET-option.

My subjects are located on different farms, and I use a split-splot model (with farm and subject(within farm) as random effects; treatment and period as fixed effects; days within period = residual variation).

The thing that I have no convergence using the OFFSET option worries me.
HTH to understand my problem.
Thanks very much for any help - Ka
plf515
Lapis Lazuli | Level 10
My own choice would be to use means, unless there is some strong reason to use counts.

Other contributors to these boards know more than I about this sort of model, though
keckk
Fluorite | Level 6
ok, no matter if means or totals (for the counts), I think I have to account for the unbalanced number of times (of observations) per subject - and I don't know if the WEIGHT option can do this. Haven't found an example.

If someone knows, I would be pleased to get advice.
Dale
Pyrite | Level 9
If I understand correctly, you have a binary variable which is observed multiple times/day on each cow. You sum across the multiple observations to get your count. You then assume that the count variable follows a Poisson or Negative Binomial distribution, with expectation conditional on the number of observations on the cow.

First question: Is the probability of a "success" small (at most 10%, but preferably under 5%). If the probability of a "success" is not small, then the count will not converge to a Poisson (or negative binomial) distribution. You would do better to model the response as a binomial with the number of successes in the numerator and the total number of observations in the denominator.

Second question (assuming that a Poisson or negative binomial approximation is reasonable): What exactly did you include as your offset term? Did you use the number of observations on which the count was based? That would not be the correct offset. The correct offset would be the log of the number of observations on which the count was based. The Poisson and negative binomial distributions employ a log link such that

E = exp(log(T) + eta) = T*exp(eta) = T*lambda

Because of the log link, we need to take log(T) for the offset (where T measures our total opportunity for observing the response - often a time variable but in your case the number of observations per cow). Did you use log(T) or did you use T for your offset?

Note: it always helps to answer questions if the code which was employed to conduct the analysis is included as part of your post. Then your reader is not left to guess at how the analysis has been performed.
keckk
Fluorite | Level 6
Hm, I apologize for not giving a code, thank you (initially I just wanted to know about the WEIGHT option as opposed to the OFFSET option) !

but no, I don't have a binary variable, just COUNTS of specific behaviours, which I observed multiple times/day (e.g. hourly I count a specific behaviour for a certain time (e.g. for 30 sec ....common methodology in ethology); for feasibility reasons it was not always possible and therefore number of times was not equal across all subjects.
Yes, I sum across multiple observations to get my response variable for the model. (sumhs1 for example)., and I thought about taking the means across multiple observations.

and yes, I used the number of times on which the response variable was based (c_hs1...number of times I counted hs during morning hours(=1) ) as offset term:

model sumhs1= origin period mean_bg1 period*origin / offset= c_hs1 dist=poisson link=log solution;
random farm farm*origin animal(farm*origin) period*farm period*animal(farm*origin);
run;

So, you mean offset=log(c_hs1) ?

I am not sure if I understand your information correctly, that T measures the total "opportunity" for observing the behaviour; actually, it IS the total number of times e.g. some subjects could be observed every hour in the morning (between 8 and 12am; c_hs1=4, some could only be observed between 8 and 9 am, 9 and 10 am, but not between 10 and 11 am ...(c_hs1=2).
Dale
Pyrite | Level 9
Is the amount of time observed constant over all observation periods? That is, do you observe each cow for 30 seconds at each observation? If so, then the number of observation periods is an adequate surrogate for the amount of time that the cow was under observation. However, if you observe a cow for 30 seconds on one occasion and for 60 seconds on another occasion, then you should be recording the amount of time under observation rather than the number of periods under observation.

Under the assumption that the length of time is constant per observation period, then your variable c_hs1 (which records the number of times that you observed the behavior over a specified time frame) is NOT the appropriate offset variable. You are correct that instead of using c_hs1 as the offset, you should be using the variable

lc_hs1 = log(c_hs1);

as your offset term.

With regard to using a WEIGHT statement to account for differing amounts of observation across cows, that is NOT the appropriate way to deal with the problem. The variance of the count will be related to the amount of time under observation - and a WEIGHT statement could help to model the variance of the response. However, the WEIGHT statement does nothing to account for different expectation of the response.

Suppose that you have two cows which are clones, so that their behavior should be identical. If you observe one of these cows for one 30 second interval and you observe the other cow for ten 30 second intervals, then the cow that you observe for ten 30 second intervals would be expected to have 10 times the number of behaviors recorded as were observed for the cow that was observed for only one period. The WEIGHT statement does nothing to account for the different expectation.

When you include as your offset log(T) (where log(T) is the log of the amount of time under observation), you account for BOTH differences in expectation as well as differences in the variance of the response. Incorporating an offset IS the proper way to perform this analysis. However, you MUST INCORPORATE THE CORRECT OFFSET.
keckk
Fluorite | Level 6
Thanks very much for your detailed response and explanations !

Yes, I constantly observed each cow at each observation over the same specified time frame, so I am going to use lc_hs1 instead of c_hs1 as offset variable. Hope it works for all the behaviours (response variables) I observed.

Your example has clearly illustrated what's the difference between WEIGHT and OFFSET in my case. Thank you so much again !

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 3257 views
  • 0 likes
  • 3 in conversation