I want to estimate a univariate Gaussian mixture model with two components using proc fmm in SAS 9.3. I tried to give some hints by means of the option partial. I have already checked the following post http://blogs.sas.com/content/iml/2011/10/21/the-power-of-finite-mixture-models.html. But in my case, SAS seems to ignore my hint. I assigned two extreme observations to different groups. But after running the procedure they are in the same cluster shown by the created variable "Predicted Component".
My code is the following:
* Giving a hint
data lib.ex;
set ex;
Note = . ;
if ... then Note=2;
if ... then Note=1;
run;
* Implementing the hint
proc fmm data=lib.ex partial=Note;
model lny = / k=2;
output out=lib.ex_n class=comp;
run;
I don't get any error messages in my log file.
Thanks in advance.
Try
proc fmm data=lib.ex partial=Note;
CLASS Note;
model lny = / k=2;
run;
As shown in the blog post, small data sets might lead to this problem, especially if the component centers are close to each other. How many observations do you think you have in each component? How far apart are the centers of the components in terms of the largest standard deviation?
You can also try specifying initial guesses for the parameters. For example, if you think the first component is N(0,1) and the second is N(2, 3) then you can specify
PARMS(0 1, 2 3);
Thanks for your answer.
I have a really huge data set and therefore enough observations in each component. The differences between the centers are quite large, I think around 5 times the largest standard deviation.
partial= option mostly serves to label mixture components, because the procedure cannot assume that assigned observations are representative of their component.
Try using model statement options / kmin=2 equate=scale; Together with the parms() option these should bring you close to the desired components.
But how can I label mixture components, if the partial option is somehow ignored and the predicted components aren't coincident with my given component membership?
Thanks for your help.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.