turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Random effects and non-normal distributions

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-19-2016 10:12 AM

Hello,

I am still rather stomping around on the beginner level with SAS (and really statistics), so I am truly grateful for any possibly oversimplification of any answers you would give me.

I have a population of eye-tracking participants, who have viewed faces under the influence of a particular treatment or placebo. This is then analyzed through a mixed model.

In short, the study includes around 80 subjects (male and female), that have been given treatment or placebo. They have then viewed pictures, that vary on two different attributes: either category (there are 5, more or less related), or view (what angle the picture was taken at - there are 3). Each subject views up to 38 pictures, but some view fewer (ranging from 1 to 38). The outcome variable then becomes the ratio of the presentation of each stimulus that is spent looking at one of 4 areas of interest (AOI). This is all presented in the long format, with every subject appearing on multiple lines - one line representing 1) the subject, 2) the specific picture (of the 38), 3) a specific AOI for that particular picture. Each subject thus has somewhere between 4 and 148 lines, where 4 lines means that only one stimulus picture was ever viewed by that particular subject (4 lines because there is an individual data point for each AOI).

I didn't do the first mixed model (performed in SPSS with GENLINMIXED), but here's the code I am trying to use to replicate their analysis:

**proc mixed data=test;****class AOI Treatment View Sex PictureCategory Participant;****model RatioOfTotalDwellTime=Treatment AOI Sex View PictureCategory AOI*Treatment AOI*PictureCategory AOI*View AOI*Treatment*PictureCategory AOI*Treatment*View AOI*PictureCategory*View;****random int / subject=Participant type=un;****lsmeans AOI*Spray / diff;****run;**

This model renders several significant results, including AOI*Treatment (subjects under treatment tend to focus on a specific part of the picture).

Now for my questions:

1) The data is, as it stands, not normally distributed, and has clumping at zero. Firstly - is PROC MIXED ok to use with that distribution? I remember reading that it's the distribution of residuals that matter - how can I check that? Secondly, wouldn't the number of data points (around 8000) basically mean that the non-normal data isn't that much of a problem (given the central limit theorem)?

2) A random intercept with SUBJECT=Participant has been added. Two questions regarding this: Firstly, does this add Participant as a random effect (just like RANDOM Participant would?). The way I understand it, adding Participant as a random effect would account for the correlation within the participants. Is this correct?

3) A more intricate problem is this: While the correlations within each participant can to some extent be accounted for (with the random statement above), there are (as I see it) probably a LOT of other correlated data in this dataset: because, for instance, a participant that tends to focus on one AOI in one picture, will probably focus on the same AOI in the other pictures. The same goes for View. This means even when I tell the program that there are only 80-ish participants (as in the code above), every single participant will be represented with up to 38 different datapoints for each of the AOIs, and those datapoints (within the subject) should to an extent be correlated - even though (if I understand this correctly) I am "pretending" that they are independent observations. If anyone has anything wise to say on that matter (regarding if I'm overthinking it, or if there is another way of writing the code or treating the data, etc) I would be very grateful. As a side note on this, if I run the analysis by including only measurements for the AOI I am interested in, and for only the most salient view angle, the results are significant until I include the random statement defining the participants as subjects - then the significances disappear.

I know that was a lot of questions and information, but I am crossing my fingers. Thank you for your patience.

Kind regards,

Daniel

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-26-2016 01:28 PM

I am replying to my own message to narrow down the problem somewhat.

What I am most concerned about is the correlation between different AOIs, specifically the fact that the AOIs (there are 4) within a single trial will always be negatively correlated and will always sum to 1. This may or may not make Participant as a random effect useless, since I have been told that this might make the model think that the the random participant variation has no effect, and thus attribute too much effect to the fixed effects (for instance Treatment*AOI, which is the most interesting). The covariance parameter for Participant is also 0, and the G matrix is not positive definite. I have since tried to include this instead of a random statement:

repeated AOI / subject=Participant*Trial;

This yield the same p-values as the random statement in the above post, without error messages about the G matrix, and with covariance parameters above 0, but I am not at all sure it's correct. Trying another way to get at the AOI correlations, i tried:

repeated trial / subject=Participant group=AOI;

This yielded covariance estimate of Participant greater than 0, no error messages appear, and the results are more, shall we say, trustworthy (the p-values are not just 1 or really really really small). However, this is without specifying a covariance matrix (hence, I assume it uses the default vc). Changing the matrix to CS kills the p-values, and I'm afraid that the model is not correct when I just use the default matrix - despite the fact that there are no error messages.

Anyone?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-30-2016 08:30 AM

Great questions throughout this, Daniel.

I'll try what I can to answer.

1)As far as residuals go, Add the RESIDUAL option to your model statement. You'll also need an ODS OUTPUT lsmeans=lsmeans; statement. The residuals will then show up in the lsmeans dataset, and you can then use UNIVARIATE to examine them.

2)The RANDOM intercept/subject=participant is a good thing, as it should speed computation up.

3)In my opinion normality is not nearly the problem that lack of independence may present, and you are doing what you can with the available data to model that.

I like the R side approach, exemplified by repeated AOI / subject=Participant*Trial;, as it actually addresses this. Have you tried this with type=cs? I suspect you may run into the same problem as with the inverted R side approach you list later, but it does avoid the unequal replication problem of number of trials per participant. And from type=cs (if it works), I would then try type=csh.

Hope this helps.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2016 05:12 AM

Please forgive my belated thank you - personal and work circumstances and the birth of a daughter got in between. But thank you very much for your answer - you are the first one to offer me some hope with regards to this, for which I am very grateful.

I do have some follow-up questions, mostly because I essentially need to debunk my colleague, who has judged the whole project as improbable. The reason our discussion started was basically this: if I ran the full model, with all AOIs (and it does seem to me that the AOIs summing up to 1 for every single trial is the biggest problem), the significances would be very strong, but the G matrix would not be positive definite. Now, there are 4 AOIs, with proportion of looking time to each AOI as the measure, and they all sum up to 1 for every trial (=photo shown). I could solve the G matrix problem by simply including only the two most important AOIs, and then the model would converge, but of course those two AOIs are still highly correlated (such that if one increases, the other decreases in proportion of looking time). The overall effects of AOI*Treatment would be significant, and so would the pairwise comparisons under "Differences of Least Square Means", for the effect of spray for both of the AOIs (but in opposite directions, of course - hence the interaction effect). The problem is that when I run the model including only ONE AOI, it's no longer significant. This lead my colleague to point out that when more than one AOI is included, the covariance parameter estimates look like this:

Covariance Parameter Estimates

Cov Parm Subject Estimate

Intercept Subject 0.001360

Residual 0.06204

When instead only one AOI is put into the model, they look like this:

Covariance Parameter Estimates

Cov Parm Subject Estimate

Intercept Subject 0.04355

Residual 0.03510

He pointed out that the intercept estimate is much smaller when two or more AOIs are included, which seems to tell him that since the AOIs are so correlated, and in opposite directions, the estimate from subjects is negligible, and all the other effects are inflated. Do you have any input on this?

Running the model with the R sided approach as we previously discussed, I can get the model to converge without complaining with as many as 3 of the AOIs, but the covariance estimates remain very low:

With 2 AOIs:

Covariance Parameter Estimates

Cov Parm Subject Estimate

Intercept Subject 0.002686

CS Subject*Trial -0.04779

Residual 0.1088

With 3 AOIs:

Covariance Parameter Estimates

Cov Parm Subject Estimate

Intercept Subject 0.000020

CS Subject*Trial -0.02596

Residual 0.07900

You stated that the R sided approach might actually address the problem. Since I can get the model to converge with that (csh did not work as well, however), my main question now is: is my colleague correct? Is this a doomed project, given the above, or is there some explanation for why this approach would be valid, even though focusing on only one AOI yields no significant results?

I hope I have provided enough information. Happy to clarify or elaborate.

Kind regards,

Daniel

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-06-2016 05:13 AM

Incidentally, this is the code I ran. Spray = Treatment.

proc mixed data=test;

class AOI Spray View Sex StimulusIdentity Subject Trial;

model RatioOfTotalDwellTime=AOI Spray View AOI*Spray AOI*Spray*View Sex StimulusIdentity;

random int / subject=subject;

repeated AOI / subject=Subject*Trial type=cs;

lsmeans AOI*Spray / diff;

where aoi < 4;

run;

Regards,

Daniel

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

07-11-2016 01:54 PM

Well, I may have led you down something of a blind alley here. When fitting an R side covariance structure, some structures "build in" a random subject effect. Three prominent members of this group are CS, CSH and UN.

Try fitting:

```
proc mixed data=test;
class AOI Spray View Sex StimulusIdentity Subject Trial;
model RatioOfTotalDwellTime=AOI Spray View AOI*Spray AOI*Spray*View Sex StimulusIdentity;
repeated AOI / subject=Subject*Trial type=cs;
lsmeans AOI*Spray / diff;
where aoi < 4;
run;
```

and see if the results are more easily interpreted. The point is that in a repeated measurement context, correlations within subject and between subject are accommodated under the compound symmetry structure (exchangeable).

Steve Denham