Re: Simulation for testing proc mixed

BlueNose · Posted 03-10-2014 11:02 AM

Hello everyone,

I need your advice regarding a little simulation I am trying to do. Recently I had a data set which I analyzed using a mixed model. Now I have simulated some data to test if my model was the right one.

Ignoring what was in the past, now I have a data I simulated, which is a bivariate gamma distribution with a correlation of 0.8 (see attached photos).

The data suppose to simulate a time variable with gamma distribution, and two measurements per each subject. I have 1000 subjects in this particular set. Each subject "gave" 2 samples with a correlation of 0.8 between them.

I chose a shape parameter of 0.7 and scale of 1. I wanted a mean of 0.7. The s.d is 0.85, I am not so why, because to my calculation that yields a scale of 1.1, but doesn't matter, maybe it's due to the inaccuracies of the simulation.

Now I wanted to estimate the mean and s.d of my data, taking into account the correlation between measurements within a subject, I tried two codes:

proc mixed data = Long;

model Value = D / s;

random intercept / subject = ID;

run;

and

proc mixed data = Long;

model Value = D / s;

random ID;

run;

where D is a vector of 1's.

The first code gave me a mean value of 0.697 with s.e of 0.025 while the first code gave me 0.647 with s.e of 0.03447. The first code was closer to the truth, and I wanted to ask why. There is no explanatory variable in this model, so how come a random intercept is the right model ? In addition, shouldn't the correlation affect only the variance, and not the mean value ?

I am quite confused here, I thought that using the intercept command, I make like a separate regression line per subject, but here I do not have an "X" variable, only "Y".

Thank you !

Simulated Bivariate Gamma Values n=1000_g1=0.7_g2=1_rho=0.8.png

SteveDenham · Posted 03-10-2014 02:43 PM

First off, PROC MIXED assumes that the data are Gaussian, so anything you get out with a gamma distributed variable is going to be questionable. I would recommend moving to PROC GLIMMIX, where you can fit a gamma distribution. The means shouldn't be too different but the s.e.'s will be substantially different After that, what do you get? (BTW, you refer to both models as the first, so that I get confused as to which actually fits the mean better.)

Also, be sure the dataset is sorted by ID, since it is not included in the class statement. All I can think of for the difference between the random intercept model and random effect of ID is that some values are excluded.

Steve Denham

BlueNose · Posted 03-11-2014 03:15 AM

Dear Steve,

You have found my mistake...I did forget the class statement. I added it to my code, and now both

random ID

and

random intercept / subject=ID

gives the same identical results. They both give a mean estimate of 0.6974, which is close enough, with a standard error of 0.02571, which is slightly higher than the real value of the parameter, but taking into account the correlation, it makes sense I guess. Unless, I am not calculating the standard deviation correctly, I see that the degrees of freedom is 999, should I use it as my sample size ? Because I was taking 1999....maybe I did wrong.

The one last thing I can't find in my output is the correlation within a subject, which suppose to be 0.8. How can I ask SAS to estimate it ? The covariance parameter estimate of the intercept is 0.5903 and the residual is 0.1409. Should I use the equation COV/s.d(M1)*s.d(m2) ? If I should, then my previous question regarding obtaining the standard deviations is of even more importance.

Regarding your comment about Gamma and Gaussian, well, this is one of the main things I want to test with this procedure. People use a mixed model without too much care for the distribution, as long as it is continuous, and I wanted to see what the outcome will be. Running a simulation of 1000 samples is the beginning, once I figure out that it's working, I will run a different data set I have simulated with 100 samples, and then another one with only 30, and then we'll see how robust the model really is. I will also try GLIMMIX.

SteveDenham · Posted 03-11-2014 08:32 AM

The 999 is correct, as it refers to the number of sampling units.

To get correlations, try adding the RCORR option to the model statement. This will print the correlation matrix for the first subject, and in this case, all subjects should have the same matrix.

I really like your approach to the simulation comparison now. You should publish/present this at SGF next year, or at one of the regional users groups meetings, at the least. I look forward to seeing what you come up with.

Steve Denham

BlueNose · Posted 03-17-2014 06:24 AM

Steve, thank you for your feedback. I will resume with the simulation and when I'll have some results I will share it.

Thanks again !