BookmarkSubscribeRSS Feed
DanielJohansson
Calcite | Level 5

Hello,

 

I am still rather stomping around on the beginner level with SAS (and really statistics), so I am truly grateful for any possibly oversimplification of any answers you would give me.

 

I have a population of eye-tracking participants, who have viewed faces under the influence of a particular treatment or placebo. This is then analyzed through a mixed model.

 

In short, the study includes around 80 subjects (male and female), that have been given treatment or placebo. They have then viewed pictures, that vary on two different attributes: either category (there are 5, more or less related), or view (what angle the picture was taken at - there are 3). Each subject views up to 38 pictures, but some view fewer (ranging from 1 to 38). The outcome variable then becomes the ratio of the presentation of each stimulus that is spent looking at one of 4 areas of interest (AOI). This is all presented in the long format, with every subject appearing on multiple lines - one line representing 1) the subject, 2) the specific picture (of the 38), 3) a specific AOI for that particular picture. Each subject thus has somewhere between 4 and 148 lines, where 4 lines means that only one stimulus picture was ever viewed by that particular subject (4 lines because there is an individual data point for each AOI).

 

I didn't do the first mixed model (performed in SPSS with GENLINMIXED), but here's the code I am trying to use to replicate their analysis:

 

proc mixed data=test;
class AOI Treatment View Sex PictureCategory Participant;
model RatioOfTotalDwellTime=Treatment AOI Sex View PictureCategory AOI*Treatment AOI*PictureCategory AOI*View AOI*Treatment*PictureCategory AOI*Treatment*View AOI*PictureCategory*View;
random int / subject=Participant type=un;
lsmeans AOI*Spray / diff;
run;

 

This model renders several significant results, including AOI*Treatment (subjects under treatment tend to focus on a specific part of the picture).

 

Now for my questions:

 

1) The data is, as it stands, not normally distributed, and has clumping at zero. Firstly - is PROC MIXED ok to use with that distribution? I remember reading that it's the distribution of residuals that matter - how can I check that? Secondly, wouldn't the number of data points (around 8000) basically mean that the non-normal data isn't that much of a problem (given the central limit theorem)?

 

2) A random intercept with SUBJECT=Participant has been added. Two questions regarding this: Firstly, does this add Participant as a random effect (just like RANDOM Participant would?). The way I understand it, adding Participant as a random effect would account for the correlation within the participants. Is this correct?

 

3) A more intricate problem is this: While the correlations within each participant can to some extent be accounted for (with the random statement above), there are (as I see it) probably a LOT of other correlated data in this dataset: because, for instance, a participant that tends to focus on one AOI in one picture, will probably focus on the same AOI in the other pictures. The same goes for View. This means even when I tell the program that there are only 80-ish participants (as in the code above), every single participant will be represented with up to 38 different datapoints for each of the AOIs, and those datapoints (within the subject) should to an extent be correlated - even though (if I understand this correctly) I am "pretending" that they are independent observations. If anyone has anything wise to say on that matter (regarding if I'm overthinking it, or if there is another way of writing the code or treating the data, etc) I would be very grateful. As a side note on this, if I run the analysis by including only measurements for the AOI I am interested in, and for only the most salient view angle, the results are significant until I include the random statement defining the participants as subjects - then the significances disappear.

 

I know that was a lot of questions and information, but I am crossing my fingers. Thank you for your patience.

 

Kind regards,

 

Daniel

5 REPLIES 5
DanielJohansson
Calcite | Level 5

 

 

I am replying to my own message to narrow down the problem somewhat.

 

What I am most concerned about is the correlation between different AOIs, specifically the fact that the AOIs (there are 4) within a single trial will always be negatively correlated and will always sum to 1. This may or may not make Participant as a random effect useless, since I have been told that this might make the model think that the the random participant variation has no effect, and thus attribute too much effect to the fixed effects (for instance Treatment*AOI, which is the most interesting). The covariance parameter for Participant is also 0, and the G matrix is not positive definite. I have since tried to include this instead of a random statement:

 

repeated AOI / subject=Participant*Trial;

 

This yield the same p-values as the random statement in the above post, without error messages about the G matrix, and with covariance parameters above 0, but I am not at all sure it's correct. Trying another way to get at the AOI correlations, i tried:

 

repeated trial / subject=Participant group=AOI;

 

This yielded covariance estimate of Participant greater than 0, no error messages appear, and the results are more, shall we say, trustworthy (the p-values are not just 1 or really really really small). However, this is without specifying a covariance matrix (hence, I assume it uses the default vc). Changing the matrix to CS kills the p-values, and I'm afraid that the model is not correct when I just use the default matrix - despite the fact that there are no error messages.

 

Anyone?

SteveDenham
Jade | Level 19

Great questions throughout this, Daniel.

 

I'll try what I can to answer.

 

1)As far as residuals go, Add the RESIDUAL option to your model statement.  You'll also need an ODS OUTPUT lsmeans=lsmeans; statement.  The residuals will then show up in the lsmeans dataset, and you can then use UNIVARIATE to examine them.

 

2)The RANDOM intercept/subject=participant is a good thing, as it should speed computation up.

 

3)In my opinion normality is not nearly the problem that lack of independence may present, and you are doing what you can with the available data to model that.

 

I like the R side approach, exemplified by repeated AOI / subject=Participant*Trial;, as it actually addresses this.  Have you tried this with type=cs?  I suspect you may run into the same problem as with the inverted R side approach you list later, but it does avoid the unequal replication problem of number of trials per participant.  And from type=cs (if it works), I would then try type=csh.

 

Hope this helps.

 

Steve Denham

DanielJohansson
Calcite | Level 5

Please forgive my belated thank you - personal and work circumstances and the birth of a daughter got in between. But thank you very much for your answer - you are the first one to offer me some hope with regards to this, for which I am very grateful.

 

I do have some follow-up questions, mostly because I essentially need to debunk my colleague, who has judged the whole project as improbable. The reason our discussion started was basically this: if I ran the full model, with all AOIs (and it does seem to me that the AOIs summing up to 1 for every single trial is the biggest problem), the significances would be very strong, but the G matrix would not be positive definite. Now, there are 4 AOIs, with proportion of looking time to each AOI as the measure, and they all sum up to 1 for every trial (=photo shown). I could solve the G matrix problem by simply including only the two most important AOIs, and then the model would converge, but of course those two AOIs are still highly correlated (such that if one increases, the other decreases in proportion of looking time). The overall effects of AOI*Treatment would be significant, and so would the pairwise comparisons under "Differences of Least Square Means", for the effect of spray for both of the AOIs (but in opposite directions, of course - hence the interaction effect). The problem is that when I run the model including only ONE AOI, it's no longer significant. This lead my colleague to point out that when more than one AOI is included, the covariance parameter estimates look like this:

 

Covariance Parameter Estimates
Cov Parm Subject Estimate
Intercept   Subject 0.001360
Residual                 0.06204

 

When instead only one AOI is put into the model, they look like this:

 

Covariance Parameter Estimates
Cov Parm Subject Estimate
Intercept    Subject 0.04355
Residual                 0.03510

 

He pointed out that the intercept estimate is much smaller when two or more AOIs are included, which seems to tell him that since the AOIs are so correlated, and in opposite directions, the estimate from subjects is negligible, and all the other effects are inflated. Do you have any input on this?

 

Running the model with the R sided approach as we previously discussed, I can get the model to converge without complaining with as many as 3 of the AOIs, but the covariance estimates remain very low:

 

With 2 AOIs:

 

Covariance Parameter Estimates
Cov Parm Subject Estimate
Intercept Subject  0.002686
CS Subject*Trial   -0.04779
Residual               0.1088

 

With 3 AOIs:

 

Covariance Parameter Estimates
Cov Parm Subject Estimate
Intercept Subject 0.000020
CS Subject*Trial  -0.02596
Residual              0.07900

 

You stated that the R sided approach might actually address the problem. Since I can get the model to converge with that (csh did not work as well, however), my main question now is: is my colleague correct? Is this a doomed project, given the above, or is there some explanation for why this approach would be valid, even though focusing on only one AOI yields no significant results?

 

I hope I have provided enough information. Happy to clarify or elaborate.

 

Kind regards,

 

Daniel

DanielJohansson
Calcite | Level 5

Incidentally, this is the code I ran. Spray = Treatment.

 

proc mixed data=test;
class AOI Spray View Sex StimulusIdentity Subject Trial;
model RatioOfTotalDwellTime=AOI Spray View AOI*Spray AOI*Spray*View Sex StimulusIdentity;
random int / subject=subject;
repeated AOI / subject=Subject*Trial type=cs;
lsmeans AOI*Spray / diff;
where aoi < 4;
run;

 

Regards,

 

Daniel

SteveDenham
Jade | Level 19

Well, I may have led you down something of a blind alley here.  When fitting an R side covariance structure, some structures "build in" a random subject effect.  Three prominent members of this group are CS, CSH and UN.

 

Try fitting:

proc mixed data=test;
class AOI Spray View Sex StimulusIdentity Subject Trial;
model RatioOfTotalDwellTime=AOI Spray View AOI*Spray AOI*Spray*View Sex StimulusIdentity;
repeated AOI / subject=Subject*Trial type=cs;
lsmeans AOI*Spray / diff;
where aoi < 4;
run;

and see if the results are more easily interpreted.  The point is that in a repeated measurement context, correlations within subject and between subject are accommodated under the compound symmetry structure (exchangeable).

 

Steve Denham

 

 

 

 

 

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1982 views
  • 1 like
  • 2 in conversation