BookmarkSubscribeRSS Feed
keckk
Fluorite | Level 6
Hi GLMM freaks !

Seems that I have quite a complicated data structure:
there are cows of different genotypes (= the effect I am mainly interested) on different farms (random) measured repeatedly over time (every 4 weeks throughout a grazing period), so equally spaced timepoints): the outcome is a score with 3 levels 0 - 1 - 2 (already combined scores out of 1-2-3-4-5 scores).
I considered to use proc glimmix, which should be able to account for the farms as random factor and hopefully at the same time for the repeated measurements as I would be interested to see how time affects the different genotypes.

This is what I thought could be the SAS code:

proc glimmix data=lameness.scores_lamecbd;
class farm genotype animal datenum;
model scoreWWcbd = genotype datenum origin*genotype /
dist=multinomial link=cumlogit solution;
random farm;
random datenum / subject=animal(farm) residual;
run;
SAS tells me that R-side effects are not supported for the multinomial distribution.

What could be the distributional approximation for that kind of dependent variable ?
Is there a possibility to model this kind of date in proc glimmix ?
If not, what else could be an option ?

Thanks very much for any help and advice ....hopefully there is someone knowing how to handle this !
K
9 REPLIES 9
SteveDenham
Jade | Level 19
Hopefully Dale will drop in, as I strongly suspect that there is an NLMIXED approach that might be helpful here. And I really would like to know if there is, as I have run into the same sort of data, and wanted to approach it in the same way. As near as I could get in GLIMMIX is something like:

proc glimmix data=multinom method=laplace ;
class animal day time group;
t=time;
nloptions maxiter=500;
model score=group/dist=mult oddsratio(diff=all) solution;
random int day/subject=animal s cl;
random time/ subject=animal*day type=sp(pow)(t) s cl;
covtest glm;
run;

This was for a dataset with repeated Draize scores. There is no equivalent to farms in this analysis--I was just trying to get something close to a repeated measures analysis for multinomial data.

Good luck.

Steve
keckk
Fluorite | Level 6
Hi Steve,
so I am not the only one with this kind of problem. I think NLMIXED is far too sophisticated for me. I am not even sure if the glimmix code provided is usable (apart from the fact that R-side effects are not supported for multinomial dist).

It seems as if you would have a nested or double repeated structure (day and time ?), but I miss the residual option in the random statement(s), thought this would be necessary as there is no repeated statement in glimmix. And I don't know the meaning of "covtest glm" and the "method=laplace" ? and why random intercepts for the variable day, but not for the variable time ?
Did you get a useful output ?

I hope there will be a solution to this kind of data...
Ka
SteveDenham
Jade | Level 19
Some pointers at what I was attmepting:

proc glimmix data=multinom method=laplace ;

I selected the method=laplace so that the residual variance would be included in the optimization. I can't give a good reason other than for small samples it is supposed to be better than the pseudo-likelihood methods.

class animal day time group;
t=time;
nloptions maxiter=500;
model score=group/dist=mult oddsratio(diff=all) solution;


All this seems standard so far.

random int day/subject=animal s cl;

This fits random effects for animal and animal by day, in the most efficient manner. Be sure the data are properly sorted.

random time/ subject=animal*day type=sp(pow)(t) s cl;

This is my rather feeble attempt at handling the repeated nature of the data. I know I can't use _residual_ with a multinomial, so I tried to construct something on the G side that would mimic the true R side design. Looking at it, it shows some sort of relationship within animal-days, but assumes independence of the time points. That would be a best case scenario, and it is as close to the design as I could get.

covtest glm;

This is a handy thing that is currently only available in GLIMMIX, and provides a test of the model against a null model of full independence. I look at the ratio of the chi-squared value presented to the number of parameters excluded in the test as a possible indicator of model rankings for goodness of fit.

WARNING WARNING WARNING: I have no real theoretical references for this. WARNING WARNING WARNING.

run;

As far as the output, it made sense to me. The odds ratios look about right, and the tests meet the eyeball tests of the plots.

Steve
MLaVenia
Calcite | Level 5

Hi Steve - I believe that I have a similar quandary. Could you please help me generalize your suggestion above to my context? In brief: This comes from a cluster randomized design with school as the unit of assignment (16 Treat; 15 Control). My data consist of teacher observations using a multi-item instrument; each item having a 5 ordered categorical response scale. One hundred twenty-two teachers were observed: 75 by a single a observer; 47 by a pair of observers (observing concurrently); 169 observations total. There were 18 observers, who conducted on average 9 observations each (SD=4, Range = 1-16). I wish to analyze for treatment effects and account for the cross-classification and nesting of the data. The below syntax is what I have so far. I'm following a G-theory framework for modeling the facets of measurement, where all facets are random except Item. Perhaps your suggestion above provides a work-around for my context--but I am coming up short making the one-to-one comparison. It occurred to me that I could create a variable where single observations were coded 1 and paired observations were randomly coded 1/2 or 2/1, which could serve as an index for time; though that seemed flawed given the paired data are contemporaneous. I appreciate your input.

proc glimmix data=WORK.Obsv_v7;

class Treat Schl_ID Obsv_ID Tchr_ID Item;

model Scale = Treat Item / dist=multinomial link=cumlogit s;

random intercept /sub=Schl_ID;

random intercept /sub=Obsv_ID;

random intercept /sub=Schl_ID*Obsv_ID;

random intercept /sub=Tchr_ID(Schl_ID);

random intercept /sub=Tchr_ID*Obsv_ID(Schl_ID);

random intercept /sub=Schl_ID*Item;

random intercept /sub=Obsv_ID*Item;

random intercept /sub=Schl_ID*Obsv_ID*Item;

run;

Furthermore, I would be grateful for any additional guidance that may occur to you. For example, given that Item is fixed, perhaps the third and last three variance components should not be modeled; that is, I could see just modeling the main effects of School, Observer, and Teacher(School) [var comps 1, 2, and 4] as conceptually adequate, but for modeling purposes perhaps the interactions are necessary. Also, I'm think method=laplace and type=chol; do you think otherwise? Lastly, you mentioned being mindful of how the data are sorted; would you agree that the proper order of sorting is School>Observer>Teacher>Item?

My apologies for piling on so many questions at once. Thank you in advance.

Best regards,

Mark

SteveDenham
Jade | Level 19

Hi Mark,

I'll borrow a page from , and say that your model is probably overspecified for the amount of data that you have.  With 169 observations in 31 schools with 122 teachers, it is going to be very difficult to get decent estimates for all of the variance components specified.  I would (maybe) consider dropping the obsv_id terms, and letting them fall to error.  The correlated observations/pseudo-replication of teachers due to concurrent observations makes this a tough problem.  What about an exploratory analysis of just the 47 teachers observed concurrently, and looking at whether there is a variance component due to observer.  If yes, then curse for about an hour, and reconcile to analyze separately by "observational status".  If no, then do something like analyze maximum score across raters on an item (or minimum score, depending on research objectives).  Start with a simplified model, with maybe just school and school*item as variance components, and add in others, so long as the estimates and algorithm remain stable.  And I would definitely try exploring method=laplace for optimizing.  I wouldn't go anywhere near type=chol as there really isn't any good reason to getting the covariances between subjects as yet.  Stick with just getting the variances to get started.

Steve Denham

MLaVenia
Calcite | Level 5

Hi Steve - Thanks for the reply. I'm afraid I'm cursing already. Nevertheless, I appreciate your honest assessment. I'll pursue this track and see where it gets me. Gratefully yours, Mark

MLaVenia
Calcite | Level 5

Hi Steve - Per your suggestion, I fit a model on just the paired observations and looked for whether there was a variance component due to observer, using the following code:

proc glimmix data=WORK.Obsv method=laplace;

where Pair='Y';

class Item Obsv_ID;

model Value = Item / dist=multinomial link=cumlogit s cl ;

random intercept / sub=Obsv_ID;

run;

and got:

Covariance Parameter Estimates

Cov Parm Subject Estimate Standard Error

Intercept Obsv_ID 0.4977 0.2063

With a 1.96 critical value, this is significant at <.05. Do I interpret this as having a variance component due to observer? This might sound a little weaselly, but if so, it's not particularly huge. When I fit a proc mixed model with VCs estimated for School|Grade|Observer|Teacher|Item, the Observer main effect only accounts for 6% of the total variance and the calculated inter-rater reliability is pretty good (Generalizability coefficient of Eρ2=.80 and Index of dependability of Φ=.73).

Please share your thoughts.

If it helps in providing your response, I have a broader concern that I keep stumbling on: It strikes me that failure to account for the pseudo-replication would lead the program to "think" that my sample is larger than it really is, at the peril of my statistical inference. This might be what you meant by "reconcile to analyze separately by 'observational status'," but I could randomly select one of the paired observations along with the single observations and run it on those 122 observations, then conduct a sensitivity analysis by swapping out the sample with the other paired observations (sort of like a split-sample cross-validation).

A final option I've considered is estimate factor scores in Mplus, using their new type=crossclassified analysis (cluster=Obsvr_ID Tchr_ID). This would generate a single fscore by teacher, which I could then plug into a proc mixed model and only have to deal with the variance due to School and Teacher(School).

I appreciate any guidance you have to offer.

Gratefully yours,

Mark

SteveDenham
Jade | Level 19

I think factor scores are a very good idea under this design.

As far as the sensitivity analysis, I think it addresses the issue, but I would try a moderate number of random samples (say 5 or 6), rather than "all of A" vs "all of B", for the observer effect might not be "separable."  Still, it is a good start, and should address the pertinent tests.

Steve Denham

MLaVenia
Calcite | Level 5

Wonderful! Thanks Steve. I appreciate your input. I'm self-taught with SAS and only a couple years in; so, I never really know if my roadblocks are due to lack of facility with the program or a mismatch between what I want to do and the data I have. My mind is now at rest that I can move on. Thanks, Mark

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 3780 views
  • 3 likes
  • 3 in conversation