BookmarkSubscribeRSS Feed
SAS_User
Calcite | Level 5
Hi,

I have a nested design that I was planning on using Proc Mixed on(since I have both fixed and random effects). But, the dependent variable has a bimodal distribution. The residuals also have a bimodal distribution(also checked residuals from the mixed model itself). As robust as proc mixed may be, are my results valid? If not, then what other kind of model can I run and what procedure will I use. I could also not find any appropriate transformations for the data that will make the data normal. Any thoughts and advice will be very helpful.

Thanks
18 REPLIES 18
plf515
Lapis Lazuli | Level 10
> Hi,
>
> I have a nested design that I was planning on using
> Proc Mixed on(since I have both fixed and random
> effects). But, the dependent variable has a bimodal
> distribution. The residuals also have a bimodal
> distribution(also checked residuals from the mixed
> model itself). As robust as proc mixed may be, are my
> results valid? If not, then what other kind of model
> can I run and what procedure will I use. I could also
> not find any appropriate transformations for the data
> that will make the data normal. Any thoughts and
> advice will be very helpful.
>
> Thanks

There is no sensible transformation that will make a bimodal distribution unimodal, since such a transformation would have to be non-monotonic.

If you did not have both random and fixed effects, I would suggest quantile regression, where you could do regression on (say) the 25th and 75th percentiles instead of the mean.

It may be possible to write something like quantreg for mixed models, using NLMIXED; I don't see it, but that doesn't mean it isn't there. NLMIXED is pretty flexible, esp. when a master like Dale is working it.

But I'd lie to ask what is your DV? Why is it bimodal? Can you separate it into two unimodal distributions and then do two regresssions?

HTH

Peter
SteveDenham
Jade | Level 19
I'll add on to Peter's comments. I think there is a another variable that may not be obvious (and possibly not in the dataset) that separates the two modes. I doubt it is as simple as sex, but failure to recognize some factor like that will lead directly to the situation you have encountered--bimodal variables with bimodal residuals. I like Peter's suggestion of subpopulation analyses as a way to attack this, at least for a first pass.

SteveDenham
SAS_User
Calcite | Level 5
Hi Peter and Steve,

Thanks a lot for your responses. I have not been able to identify a variable that creates the bimodality in the data, but I am going to look again.

As for the results from the Proc mixed, are they completely invalid, or is it ok to use those results with a mention of the violation of assumptions?

Thanks
plf515
Lapis Lazuli | Level 10
Hi again

The results are invalid; one way to look at it is to say that you are looking at predictors of the mean, but in a bimodal distribution, the mean is (pardon the pun) not very meaningful.

On average, Switzerland is at sea level - just throw the mountains into the lakes.

Or, if you were modelling height, and had a sample made up of basketball players and jockeys, would you want to model the mean?


Peter
SAS_User
Calcite | Level 5
Thanks a lot Peter. Very helpful explanation.
Dale
Pyrite | Level 9
Perhaps you could elaborate on how the data were collected and what the random effects structure is like for your data. A couple of things which I would be interested in are as follows:

1) Are there multiple levels of random effects, or can the random effects be modeled using a single subject specification? To answer this question well requires elaboration to some degree on the experimental design.

2) Is the residual bimodality related to between-subject differences? If so, what characterizes subjects? If you can identify a subject-level variable which is related to the bimodality, then you can include this in your modeling efforts and should be able to eliminate the problem of bimodality. Of course, you could end up with a situation in which you determine that there are between-subject differences, but you cannot immediately determine any variable which is related to these differences.

If you cannot determine any reason for the bimodality and if random effects can be modeled through a single subject specification, then it may be possible to write code employing the NLMIXED procedure which accounts for the bimodality. There are two different types of model which might be constructed depending on whether the bimodality is attributable to between-subject differences or whether the bimodality is attributable to within-subject differences.

If the bimodality is attributable to between-subject differences, then we could employ a model of the form

    P1*f(y,x,beta,b1) + (1-P1)*f(y,x,beta,b2)

where b1 and b2 are random effects with means mu1 and mu2, respectively. The fixed effects are assumed to be the same for the two different sets of subjects.

If the bimodality is attributable to within-subject differences, then we could employ a model of the form

    P1*f(y,x,beta1,b) + (1-P1)*f(y,x,beta2,b)

The assumption of this model is that there are different sets within subjects. Typically, one might assume only intercept differences between the two within-subject sets. However, one could extend the differences to differential effects of predictor variables.

Mixture distributions are really quite intriguing. They offer the opportunity to identify - or at least speculate on - some as yet unknown source of of significant variation.

In order to provide more specific assistance, it would help to know more about the problem.
SAS_User
Calcite | Level 5
Dale,

Thanks for your elaborate response. I am just seeing your post from Friday.

The experiment design is as follows. I have 12 samples. Each sample is collected in two containers. Each of these has 24 replicates. These 24 replicates are processed using two types of chemicals. So, there are 48 replicates per sample and 12 in each of the 4 categories(container 1/chemical 1, cont 1/chem 2, cont 2/chem 1, container 2/chemical 2). Then they are processed on 4 devices by 3 people and the 4 devices are randomized across the 3 people in the same way for each sample. Then, samples 1-6 are run on the same day and samples 7-12 are run on the same day. There are 6 runs on one day and 6 runs on another with 8 replicates of each sample going into a single run. That’s 8 reps x 6 samples = 48 specimens per run. There are a total of 12 runs like that-6 for samples 1-6 and 6 for samples 7-12. The goal is to estimate variability due to the chemical, device, people and samples. I hope this will help answer question 1

There does not seem to be a reason for the bimodality. One argument I heard is that it is the nature of these samples, and also since there are only 12 samples. (there are 48 replicates in each sample but I believe replicates are extremely similar to each other and hence the sample size is a little low to expect normality. I am not sure I am convinced about that). I also looked at histograms of the replicates within each sample and some of those also look bimodal, although not all of them.

I look forward to hearing your thoughts.

Thanks
Dale
Pyrite | Level 9
So, the answer to my first question is that there are multiple levels of random effects. A single subject effect cannot be assumed. That means that fitting a mixture model and accounting for all of the sources of variance cannot be accomplished using the NLMIXED procedure.

Given the description of the experimental design, I presume that the primary interest is to determine whether there is a differential effect of the two chemicals which are used to process the samples. One possible approach to analyzing these data would be to construct a permutation distribution of the chemical effect estimates. Randomly attribute half of the observations to chemical 1 and the other half to chemical 2. Fit your statistical model to these permuted data and obtain an effect estimate. Generate one thousand or ten thousand such random assignments in your data and obtain the chemical effect estimate for each experiment permutation.

Now, compare the effect estimate obtained for the observed data with the distribution of effect estimates in the randomized data. If the effect estimate in the observed data is in the tail area of the permutation distribution (lower 2.5 percentile or upper 2.5 percentile), then you have evidence of a differential chemical effect.

It would be best to perform the randomizations within the experimental design. That is, where there are two different chemicals from the same sample, same container, same day, same everything except for the chemical which was used to process the sample, then randomly assign one of those two to chemical 1 and the other to chemical 2. I haven't looked at your specification of the experimental design closely enough to determine whether this is or is not possible. However, I suspect that it is possible.

HTH
SAS_User
Calcite | Level 5
Dale,

Yes, the original purpose of this experiment was to look at the variability contributed by the chemical, people and device. From more inspection, I found that the distribution of replicates within each sample is normal (actually the residual plots). So, I am thinking of running a mixed model for each sample. Would you see any issues with that? That way I can still present the variance component estimates of the people, devices, etc.
The permutation also sounds like a great idea and I am going to look at it more closely. Since the people who I will be presenting the results to are most familiar with the mixed model and expected to see that, I am trying to make that work.

Thanks again
Dale
Pyrite | Level 9
So the bimodality is due to sample differences? This could be exceedingly good news for you. Do you need to characterize the between sample variability in addition to the variability due to chemical, people, and device? If not, then you could treat sample as a fixed effect in the analysis and then look at the amount of variability attributable to the three sources you have indicated.

If what you are trying to do is to partition variance components, then the permutation test may not be what you want. The permutation test would be a way to perform a semi-parametric test to assess whether there is a significant fixed effect of chemical.
SAS_User
Calcite | Level 5
I do need to estimate the variability due to the samples. But, even if I didn't and put in sample as a fixed effect, isn't it true that I can't run a mixed model because the sample measurements are bimodal. Maybe I am not understanding your idea correctly.

While I was reading your post, I also realized that I could divide the data into two groups where the samples are closer together and then that data will be normally distributed. So, I can do a subgroup analysis and the basis of the two subgroups will be samples grouped together to give a normal distribution.
SteveDenham
Jade | Level 19
I don't think the latter idea is going to give you what you need. The fixed effect of sample is probably the best thing that could happen. This should eliminate the bimodal distribution in the residuals, which is problematical. We use mixed models all the time on samples that are bimodal--just consider body weights in a mixed gender population. The males have a different mode/mean than the females, while the distribution around the means is about the same. This is not a problem, if we include gender as a fixed effect in the model. The estimate of the gender effect (males - females) is the difference between the modes/means.

Plus, it confirms (somewhat) my suspicions from the beginning--that there was an unidentified factor separating the measurements into two populations.

SteveDenham
SAS_User
Calcite | Level 5
Thanks Steve! Putting in sample Id as a fixed effect worked great leading to a normal residual plot. Thanks for your explanations-that's very helpful.
Now everything is good with my model, except that I don't have a way to measure the % variability due to to the sample (or the inherent biological variability in the samples). Can I say that the residual variance is the variability due to samples.

Just as additional information, my random effects are "chemical(sample ID)", "people(sample ID)" and "device(sample ID)". I am measuring all of these within the sample, as measurements coming from the same sample are correlated. The two days on which the experiment was done is also a fixed effect as the 6 samples run on day 1 are shifted in summary measures from the 6 samples run on day 2. However day is not the separating factor for the bimodality(samples are still bimodal within each day) as some of the samples run on day 1 group together and some of the samples run on day 2 group together, driving the distribution of the samples by day.
SteveDenham
Jade | Level 19
Shoot. Now we're into analytical chemistry, and that's one of the reasons I changed my major back when mammoths roamed the earth.

I'm going to go out on a very thin limb, and guess that the grouping of samples into discrete populations is a matter of chance, and this whole thing might be solved with a larger experiment. That provides absolutely no help in analyzing the data at hand, though. An investigation into lab procedures is about all I could hope to offer, at this point. Perhaps there is some systematic difference in sample prep that leads to the separation.

Maybe someone who has more experience in gage methods will drop by and have something good to offer.

SteveDenham

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 9050 views
  • 0 likes
  • 4 in conversation