BookmarkSubscribeRSS Feed
kristyn
Calcite | Level 5

Hello, I am hoping someone in this community has come across and found a workaround for this problem. Broadly I have an analysis that yields different p and f values (sometimes the f-values are 0) dependent on changing variables in the class and model statement. 

 

 

My syntax is:

PROC GLIMMIX DATA=SASUSER.dataset METHOD=laplace IC=Q;
class PID Task Age_Group Condition;
model TaskPerformance = Task Age_Group Condition Age_Group*Task*Condition / s dist=multinomial link=cumlogit;
random int / subject=PID g;
covtest / wald;
run;

 

The variables: 

  • PID (participant ID): there are almost 400 participants in the dataset
  • Task:  a 2-level within-subjects variable
  • Age_group: a 7-level between-subjects variable
  • Condition: a 3-level between-subjects variable
  • TaskPerformance: a multinomial DV with 4 outcome levels.
  • Random intercept of PID is used and dataset is long form - rows correspond to level of TASK variable. 

 

My issue is: 

  • 0.00 f-values for 2/3 main effects in the type III fixed effects 
  • unstable f and values for 1/3 main effects in type III fixed effects and 3-way interaction dependent on the order of variables in the class statement. 

 

Troubleshooting so far: 

  • When analysing the main effects model alone, all p-values are <.001 and the f-values remain stable irrespective of the order of variables in class statement. 
  • Reproduced the entire dataset, ensured no missing values, erroneous entries, etc. and reimported into SAS - continued to find same issue.  

 

Has anyone encountered a similar issue and found a resolution? 

 

Thanks! 

Kristyn 

 

 

SAS Version and platform info:

Build date: 14 Sep 2017 4:12:50 AM
SAS Mid-tier release: 23 Aug 2017 7:00:00 PM
Java Version: 1.7.0_151
SAS release: 9.04.01M5P09132017
SAS platform: Linux LIN X64 2.6.32-642.15.1.el6.x86_64

 

9 REPLIES 9
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Do you have at least one observation in each combination of Age_Group x Task x Condition (42 cross-tabulated cells)?

kristyn
Calcite | Level 5

Yes, there is a minimum of 16 observations per cell. 

 

Data was structured as following: participants were assigned to 1 condition out of 3, 1 age group out of 7 and did both Tasks (1 & 2). 

Each age group had a minimum of 48 participants with each condition (within age groups) having at least 16 participants. 

PaigeMiller
Diamond | Level 26

If no three-way interaction gives stable and non-zero F-values, and presence of three-way interaction gives unstable and sometimes zero interactions, then the three-way interaction is the problem, your study design doesn't support estimating a three-way interaction.

--
Paige Miller
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I was thinking along the same lines as @PaigeMiller, particularly if there were zero cells in the factorial. You say there aren't, but overfitting might still be the problem.

 

Some thoughts and things to try:

 

1. Generally, if term ABC is in the model, we also include A, B, C, AB, AC, and BC to maintain the interaction hierarchy. You are omitting the two-way interactions. See Section 8.11 in Oehlert: A First Course in Design and Analysis of Experiments. More details on the impact of omitting lower level interactions is here http://users.stat.umn.edu/~gary/classes/5303/lectures/Factorials.pdf (search for the Hierarchy section). Add the two-way interactions.

 

2. Instead of

 

random int / subject=PID g;

try

 

random int / subject=PID(condition age_group) g;

 

or

 

random task / subject=pid(condition age_group) type=<whatever> ;

 

where <whatever> is cs or un.

 

3. Start simply and build up: Pick one outcome and use a binomial model. Drop the RANDOM statement. Sequentially increase the complexity of your fixed effects (intercept only; main effects only; main effects and 2-way interactions; full 3-way factorial). Add the RANDOM statement back, and step through fixed effects models again. Add the multinomial response, and repeat. See whether that helps locate where/when the model fails.

 

4. Try method=quad() rather than laplace.

 

5. If results depend upon "the order of variables in class statement" (by which you mean that if you keep the same variables but list them in different order?), then that's a problem to take up with Technical Support. Although it could be just another symptom of a dysfunctional model and so not a thing to worry about (you have to have a functional model first). If you put different variables in class, then differences in results would be expected.

 

Let us know what you discover.

 

kristyn
Calcite | Level 5

Great suggestions - i have attempted to work through them. 

 

1. I have included all two-way interactions into the model in addition to the three-way. Unfortunately, two of the main effects and one of the two ways still yield a zero f-value. 

 

2a. This did not alter the model results, still yielding zero f-values

 

2b. This looked quite promising. Upon first inspection this fixed the zero f-value issue. However in an attempt to replicate the instability of my earlier models, this one was also shown to be unstable. 

 

The syntax worked in this case: 

PROC GLIMMIX DATA=SASUSER.onscreenimitation_extras METHOD=laplace IC=Q;
class PID Task Age_Group Condition;
model TaskPerformance = Task Age_Group Condition Age_Group*Task*Condition / s dist=multinomial link=cumlogit;
random task / subject=PID(Condition Age_Group) type=un;
covtest / wald;
run;

 

But when age_group is moved before task in the model statement it fails, syntax: 

PROC GLIMMIX DATA=SASUSER.onscreenimitation_extras METHOD=laplace IC=Q;
class PID Task Age_Group Condition;
model TaskPerformance = Age_Group Task Condition Age_Group*Task*Condition / s dist=multinomial link=cumlogit;
random task / subject=PID(Condition Age_Group) type=un;
covtest / wald;
run;

3. I am still working through this suggestion - but attempted stepping down rather than stepping up through the model. I removed the three-way interaction (and included all 2-way interactions) and the model completely stabilises. Unfortunately our a priori hypotheses, study design and underlying theory necessitates a three-way interaction. Interestingly, when using a continuous Age measure, rather than a categorical age measure, the entire model including the three way interaction is stable. Suggesting the problem may arise from the Age_group variable. 

 

4. This method will not run. I receive the error "quanew optimization could not be completed". 

 

5. Yes I refer to using the same variables in different orders in the class and/or model statement (changes to either or both lines result in the same issues). I do suspect that it is a symptom of a dysfunctional model given that the Age_Group variable yields instability whilst the Continuous_Age variable yields stability in identical models. 

 

Thank you for all of your suggestions thus far, it seems we are getting closer to a solution! 

PaigeMiller
Diamond | Level 26

@PaigeMiller wrote:

If no three-way interaction gives stable and non-zero F-values, and presence of three-way interaction gives unstable and sometimes zero interactions, then the three-way interaction is the problem, your study design doesn't support estimating a three-way interaction.


Adding to my comment above ... I have seen cases where the design "supports" a three-way interaction, in the sense that there are sufficient degrees of freedom; but the X'X matrix is sooooooo close to singular that effectively some parts of the model cannot be estimated or are estimated in a way that the estimates are very non-stable.

 

It may be worthwhile to obtain the design matrix from PROC GLMMOD and then seeing what the determinant of this matrix is, to see if it is extremely close to zero; or finding the inverse of the X'X matrix (which I believe can be done in PROC REG using the output of PROC GLMMOD)

 

Also, you mention that there is 16 observations in each cell, but what about missing values? if any of the observations have missing values, then you don't really have 16 observations in that cell, you could have a lot less, leading to either an empty cell or extremely unstable estimates. In fact, I don't really know how GLIMMIX handles multinomial response (never used it for multinomials) but it coulde be that there isn't enough data in each level of the multinomial response in each cell.

--
Paige Miller
kristyn
Calcite | Level 5

Hi Paige, 

 

I am quite new to SAS so many of your suggestions might take me some time to tackle as I am unfamiliar with PROC GLMMOD and PROC REG (I actually only migrated from SPSS to SAS for GLIMMIX so am very fresh.) Do you have any exemplar syntax you may be able to share to assist me in answering your advice. Are there resources on how to interpret design matrices also? (apologies for the potentially ignorant questions!)

 

One thing I am certain of is that there are no missing values anywhere in the dataset. I have triple checked the data, reconstructed that dataset from scratch and done everything as cleanly as possible. However your suggestion that there may not be enough data in each level could be quite correct. As noted in my respond to sld, when I changed the "Age_group" categorical variable to a continuous age variable, the whole model stabilised.

 

thanks!  

PaigeMiller
Diamond | Level 26

PROC GLMMOD examples

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_glmmod_toc.htm&docsetVersion=14.2&...

 

PROC REG

http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_reg_toc.htm&docsetVersion=14.2&loc...

 

Even if you have no missing values, you may have some of the multinomial levels at zero for some of your design. I don't know exactly how GLIMMIX handles this, but I suspect this could be (part of) the problem. 

--
Paige Miller
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Kristyn,

 

If I were you, I would always maintain the interaction hierarchy, i.e., I would include all lower order interactions that were components of a higher order interaction. That might mean that the 3-way interaction cannot be included, and although it represents your research hypothesis, there is scant point in including it if the model is dysfunctional. 

 

Re (3): The facts (a) that the ANOVA-like model works when the 3-way interaction is dropped and (b) that regressing on Age_Group works both support @PaigeMiller's suspicion that the data structure is inadequate for a 3-way ANOVA-like fixed effects structure. Regression on Age_Group could be a viable alternative, provided that regression makes sense (i.e., that the distance between the levels of Age_Group are sensible in the sense of "one unit change in Age_Group causes some number of units change in logit(Y)" and that the relationship with the logit is linear). You would need to be careful about how you specify RANDOM statements for the model incorporating regression.

 

Paige's point that "you may have some of the multinomial levels at zero for some of your design" makes sense to me. Try various cross-tabulations of TaskPerformance, Age_Group, Condition, and Task. You may find marginal zeros (absence of observations for certain combinations of TaskPerformance and one or more predictor variables), which would I think generate the problems you are having. The topic of "sampling zeros" is addressed in the log-linear model literature; the logit model is derived from the log-linear model, so that literture is pertinent. 

 

Re (5) I have never seen results change as a consequence of order, and I have fitted a lot of models to a lot of data. But, that said, I have fitted very few multinomial mixed models. So it might be a "feature" of that sort of model. That's an issue for Tech Support, but the best they might be able to do is explain why you get that sort of model flaky-ness; I increasingly suspect the fact is that your data fail to support a model with the form of complexity that you want.

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1689 views
  • 1 like
  • 3 in conversation