BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ellenstats
Calcite | Level 5

Hi,

I am looking for some help with a logistic mixed model. I have a model examining whether testers diagnose a disease (yes/no) in animals. The test can be subjective so I want to look at whether there is any variation in the number of positive cases after other disease risk factors have been accounted for. One of the main aims of the study is to compare the tester variation over 2 years (2010 and 2013) to look at whether there has been any change following an introduction of a reporting scheme. However, the disease incidence declined over this period. In addition the tester variation declined. A reviewer is now questioning whether the decline in tester variation is due to a decline in disease incidence and hence the number of cases or whether it is due to a decrease in tester variation (and an improvement in consistency).  Would having year as a fixed effect in the model account for the change in incidence?

The reviewer suggested running the model separately for counties that increased in prevalence (the main model is based on national data). However when I do this there are both increases and decreases in variation but none of them are significant. Is there a minimum size when testing differences in variance?

I have used the covtest option in SAS to compare the variation by tester. Another reviewer is questioning whether this is a likelihood ratio test. From the SAS manual I believe it tests the pseudo likelihood – does anyone know of any decent references for this test?

This is the SAS code I have used:

Proc Glimmix data = model1 method = laplace;

class county pcows type tester;

      model case = pcows county type year

      / dist=binary link=logit solution or cl;

      random intercept / subject = tester grp = year s cl;

      covtest 'Equal Covariance Matrices'  homogeneity;    

run;

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

You have lots of testers, I would suppose (say n>20 or thereabouts), but only two years, and you are testing for homogeneity between years.  I would say that for the counties where incidence increased, you really don't have a strong case for heterogeneity.

At least one big caveat enters the picture here--for a binomial response, the variance is a function of the expected value, so there is built in heterogeneity for the levels examined, and if the two years differed substantially, I would expect the test to show something.  If it is not significant, I wouldn't lay it off to lack of power, as this test is pretty sensitive.  As an aside, I consider testing for homogeneity to be a lot like George Box alluded--setting out in a rowboat to see if the ocean is safe for an ocean liner.  I would just make the assumption that there was heterogeneity due to the distribution and sample size differences.  There is a very small price to pay in terms of degrees of freedom, but I would rather pay up front, than dig around afterwords and try to justify homogeneity.

Now on to counties.  The example you give convinces me even more that it should be treated as a random effect.  Your inference space here is all possible counties, not just the counties measured. Also, I consider your data to be epidemiological.  When someone as well regarded in the field of epidemiology as Larry says treat county as random, I would really, really think hard about the suggestion.

Steve Denham

View solution in original post

9 REPLIES 9
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Regarding your last question, since you are using method=LAPLACE, it is a likelihood ratio test and it is based on the actual likelihood of the data, not the pseudo-likelihood. The SAS User's Guide is a satisfactory reference. If you used the default estimation method (method=rspl), then it would be based on the pseudo-likelihood. It is better to base these tests on the actual likelihood. A good overall reference is Walter Stroup. 2012. Generalized Linear Mixed Models.

ellenstats
Calcite | Level 5

Thanks very much IVM that's very helpful

SteveDenham
Jade | Level 19

I responded on SAS-L to something similar (check 's response vs. mine and I think you'll see mostly agreement regarding COVTEST and Stroup's book).

Still, I wonder if county is a fixed effect for the inferences you want to make.  If you have sufficient data, what happens if you pull county from the MODEL statement, and incorporate it in the RANDOM statement, such as:

random intercept county/subject=tester group=year s cl;

This fits a county*tester variance component, which I think may be more likely than a county fixed effect, unless all testers are used in all counties.  Of course, you may not have enough data to fit the additional components.

Steve Denham

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

I agree with Steve about county being a random effect.

ellenstats
Calcite | Level 5

Thank you all so much for your interesting ideas. I am not totally convinced about adding county as a random effect since testers are not completely nested within county and may be testing in several different counties if they are near the borders. The main aim of the model is to test whether tester variation has changed over time. One reviewer suggested looking at just the counties where disease incidence increased over time to see whether tester variation still decreased. However, when I try this the covtest is not significant. I have read somewhere that a large sample size is needed for these tests, however, I can not find the reference have either of you come across a reference to the sample size needed for the covtest?

Many thanks for your responses.

SteveDenham
Jade | Level 19

You have lots of testers, I would suppose (say n>20 or thereabouts), but only two years, and you are testing for homogeneity between years.  I would say that for the counties where incidence increased, you really don't have a strong case for heterogeneity.

At least one big caveat enters the picture here--for a binomial response, the variance is a function of the expected value, so there is built in heterogeneity for the levels examined, and if the two years differed substantially, I would expect the test to show something.  If it is not significant, I wouldn't lay it off to lack of power, as this test is pretty sensitive.  As an aside, I consider testing for homogeneity to be a lot like George Box alluded--setting out in a rowboat to see if the ocean is safe for an ocean liner.  I would just make the assumption that there was heterogeneity due to the distribution and sample size differences.  There is a very small price to pay in terms of degrees of freedom, but I would rather pay up front, than dig around afterwords and try to justify homogeneity.

Now on to counties.  The example you give convinces me even more that it should be treated as a random effect.  Your inference space here is all possible counties, not just the counties measured. Also, I consider your data to be epidemiological.  When someone as well regarded in the field of epidemiology as Larry says treat county as random, I would really, really think hard about the suggestion.

Steve Denham

ellenstats
Calcite | Level 5

Hi Steve,

Thanks so much again for your time and thoughts. I can see what you mean about the county random effect, however, I may have been a bit misleading with the SAS programme that I pasted into my first post. The 'county' variable is actually the incidence rate in the county in the previous year, it is a categorical variable with 5 categories (since it was non-linear) and the herd (which is my actual unit of measurement) is allocated to a category depending on the county incidence in the previous year. I have tried your suggestion of using the actual county as a random effect, however, the model will not run because of insufficient memory (I have a v large processor so I hardly ever have memory problems!).

I have also run my original model for 2 subsets, the first is all counties where incidence decreased between the two years and the second are all the counties where incidence increased. For both of these models the covtest was not significant. Therefore, the only time I get a significant difference is at the national level (all counties and all herds are included in my dataset). These results lead me to think that there may be a power problem. However, my 2 subsets are quite large (>400 testers) so maybe I should just accept that heterogeneous variation is only observed at the national level and possibly then conclude that in general there has been an improvement in consistency of testers (ie reduced variation in the latter year) at the national level, however, these results can not be generalised to smaller populations?

Thanks once again for your help, it really is appreciated and I have the book you suggested on order so hopefully that will also help!

SteveDenham
Jade | Level 19

Hmm.  I apologize about the remarks regarding county then--completely misread what it was all about.  Not surprised that the random effect of actual county runs into the insufficient memory problem.

I think your conclusion regarding improvement at the national level makes sense.  It seems that, as in normality tests, you can detect even miniscule changes in homogeneity with large samples--and then the decision becomes risk-based.  You would really need to know how large a penalty would apply for making the incorrect decision.

One other thought--if your design matrix is anything close to sparse, try using %HPGLIMMIX on it (Google it, authors are Liang Xie and ).

Good luck.

Steve Denham

ellenstats
Calcite | Level 5

Thanks again for all your help. It should be me apologizing for trying to over simplify at the start and not making myself very clear!  Since this is just one indicator or several I don't think the penalties will be too large - depending on what the other measures show!

Thanks for the tip about HPGLIMMIX I will definitely look into that.

Thanks again.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1713 views
  • 7 likes
  • 3 in conversation