BookmarkSubscribeRSS Feed
magnusm
Calcite | Level 5

Dear all,

I have a problem where I have repeated measurements per subject and would like to fit a mixture model with random effects in proc nlmixed. The data is organized in long format (each subject has multiple rows for multiple observations). Different subjects have a different number of time points. The structural model is a Weibull model and the mixture model I am trying to implement has two components that are both normally distributed. Below I give some example rows of my data and the current (simplified version of) the model I am trying to fit.

* example datalines;

data data;

  input subjectID time yvar;

  datalines;

1 14 32

1 42 30

1 77 32

1 105 31

1 140 28

1 189 30

1 259 27

1 315 27

2 11 31

2 39 34

2 74 32

2 102 31

2 130 30

2 186 24

2 313 17

3 6 36

3 34 37

3 69 37

3 97 32

3 126 33

3 189 32

3 252 33

3 315 31

4 7 31

4 64 27

4 134 27

4 197 26

4 253 21

4 323 23

5 0.001 28

5 13 28

5 49 27

5 71 29

5 104 27

5 139 27

5 204 28

5 258 26

5 328 24

6 0.001 35

6 28 33

6 56 31

6 84 30

6 111 27

6 182 28

6 245 27

6 301 26

7 0.001 32

7 27 32

7 62 34

7 84 31

7 119 31

7 181 30

7 244 28

7 307 28

7 364 25

8 0.001 31

8 18 30

8 41 29

8 69 29

8 97 28

8 133 28

;

* model to fit;

proc nlmixed data=data gconv=0 maxit=100 maxfunc=2000;

  parms A1=33 A2=29 B=1.1 td_pop=1140 s2epsilon=2 s2tau=1 prob=0.2;

  bounds s2epsilon s2tau prob > 0;

  bounds prob < 1;

  pi=constant('pi');

  td=td_pop*exp(tau);

  mean1=A1*exp(-(time/td)**B);

  mean1=A2*exp(-(time/td)**B);

  p1=1/((2*pi*s2epsilon)**0.5) * exp(-0.5*(((yvar - mean1)**2) / s2epsilon));

  p2=1/((2*pi*s2epsilon)**0.5) * exp(-0.5*(((yvar - mean2)**2) / s2epsilon));

  lh=p1*prob + p2*(1 - prob);

  ll=log(lh);

  model yvar ~ general(ll);

  random tau ~ normal(0,s2tau);

run;

The problem I have is that (at least I think) SAS evaluates the likelihood contributions per data row. This means that my subject specific likelihood contribution (without the integration) is:

p_i=prod_j(p1*prob + p2*(1 - prob)), where the prod_j is a product over the j time points per subject.

This subject specific likelihood contribution implies that different measurement on a subject can be from different mixture components, which is obviously not what I need. What I need is something like:

p_i=prod_j(p1)*prob + prod_j(p2)*(1 - prob).


In this case the two likelihoods for the two mixture components are calculated separately and later added under the mixture probabilities, or in other words, the likelihood is calculated as a mixture of two likelihoods where the whole subject is from one component in the first and the whole subject is from the other component in the second.


I hope it sort of makes sense what I'm writing here. I have been googling for almost two days and although this seems like a fairly common problem to me, I didn't find much information on how to do this in SAS' proc nlmixed. I hope you will be able to give me some sensible help or otherwise directions in which to look!


Thanks in advance,


Magnus

2 REPLIES 2
magnusm
Calcite | Level 5

I found something of a solution I think. It is a bit messy and I'm not sure if this is actually helping me, but it would be nice if someone might have a look over it and give some comments on whether the code does what I think it does.

So first of all I changed my data format to a wide format, meaning that there is just one row per subject now. In this row the dependent variable for different time points is given as different variables and the associated times are in separate variables. I also included a variable that contains the number of observations for a patient, which is useful for a loop later on. Example data for 20 patients is given below:

* example data;

data data;

infile datalines delimiter=',';

input subjectID n yvar1-yvar19 time1-time19;

datalines;

2956,8,32,30,32,31,28,30,27,27,.,.,.,.,.,.,.,.,.,.,.,14,42,77,105,140,189,259,315,.,.,.,.,.,.,.,.,.,.,.

3551,7,31,34,32,31,30,24,17,.,.,.,.,.,.,.,.,.,.,.,.,11,39,74,102,130,186,313,.,.,.,.,.,.,.,.,.,.,.,.

3870,8,36,37,37,32,33,32,33,31,.,.,.,.,.,.,.,.,.,.,.,6,34,69,97,126,189,252,315,.,.,.,.,.,.,.,.,.,.,.

4752,6,31,27,27,26,21,23,.,.,.,.,.,.,.,.,.,.,.,.,.,7,64,134,197,253,323,.,.,.,.,.,.,.,.,.,.,.,.,.

5155,9,28,28,27,29,27,27,28,26,24,.,.,.,.,.,.,.,.,.,.,0.001,13,49,71,104,139,204,258,328,.,.,.,.,.,.,.,.,.,.

7399,8,35,33,31,30,27,28,27,26,.,.,.,.,.,.,.,.,.,.,.,0.001,28,56,84,111,182,245,301,.,.,.,.,.,.,.,.,.,.,.

8480,9,32,32,34,31,31,30,28,28,25,.,.,.,.,.,.,.,.,.,.,0.001,27,62,84,119,181,244,307,364,.,.,.,.,.,.,.,.,.,.

9683,6,31,30,29,29,28,28,.,.,.,.,.,.,.,.,.,.,.,.,.,0.001,18,41,69,97,133,.,.,.,.,.,.,.,.,.,.,.,.,.

10995,7,29,30,29,29,27,27,26,.,.,.,.,.,.,.,.,.,.,.,.,0.001,28,63,93,126,154,182,.,.,.,.,.,.,.,.,.,.,.,.

11041,8,24,19,20,19,19,19,19,16,.,.,.,.,.,.,.,.,.,.,.,40,94,108,137,164,221,290,350,.,.,.,.,.,.,.,.,.,.,.

13105,9,29,29,28,28,31,33,26,20,14,.,.,.,.,.,.,.,.,.,.,0.001,7,35,63,92,120,190,244,314,.,.,.,.,.,.,.,.,.,.

13613,5,32,28,23,25,24,.,.,.,.,.,.,.,.,.,.,.,.,.,.,28,56,84,112,143,.,.,.,.,.,.,.,.,.,.,.,.,.,.

15472,8,22,25,26,24,24,23,20,22,.,.,.,.,.,.,.,.,.,.,.,14,42,70,98,126,183,252,308,.,.,.,.,.,.,.,.,.,.,.

16375,8,21,22,14,18,13,13,8,6,.,.,.,.,.,.,.,.,.,.,.,0.001,7,48,74,95,132,200,326,.,.,.,.,.,.,.,.,.,.,.

18157,8,28,22,21,20,20,23,14,10,.,.,.,.,.,.,.,.,.,.,.,0.001,29,59,92,127,183,253,316,.,.,.,.,.,.,.,.,.,.,.

18747,8,33,34,32,30,28,27,26,24,.,.,.,.,.,.,.,.,.,.,.,0.001,24,52,80,108,136,199,273,.,.,.,.,.,.,.,.,.,.,.

19256,7,34,31,32,34,30,32,33,.,.,.,.,.,.,.,.,.,.,.,.,0.001,35,63,98,126,154,182,.,.,.,.,.,.,.,.,.,.,.,.

19764,6,35,34,33,33,33,31,.,.,.,.,.,.,.,.,.,.,.,.,.,7,70,133,189,252,316,.,.,.,.,.,.,.,.,.,.,.,.,.

19871,6,24,18,16,15,15,14,.,.,.,.,.,.,.,.,.,.,.,.,.,10,70,135,198,247,329,.,.,.,.,.,.,.,.,.,.,.,.,.

20121,4,30,24,29,29,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,5,40,63,106,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.

;

*The model now looks like this:

proc nlmixed data=data gconv=0 maxit=100 maxfunc=2000;

  array yvars{19} yvar1-yvar19;

  array times{19} time1-time19;

  array means{38} mean1-mean38;

  parms td_pop1=536.32 td_pop2=29795 a_pop=33.2965 b_pop=0.5079

   prob=0.3710 s2epsilon=20.9501;

  bounds s2epsilon prob > 0;

  bounds prob < 1;

  pi=constant('pi');

  td1=td_pop1;

  td2=td_pop2;

  B=b_pop;

  A=a_pop;

  lh1=1;

  lh2=1;

  do i=1 to n;

  means=A*exp(-(times/td1)**B);

  means[i+19]=A*exp(-(times/td2)**B);

  lh1=lh1*(1/((2*pi*s2epsilon)**0.5)) * exp(-0.5*(((yvars - means)**2) / s2epsilon));

  lh2=lh2*(1/((2*pi*s2epsilon)**0.5)) * exp(-0.5*(((yvars - means[i+19])**2) / s2epsilon));

  end;

  lh=prob*lh1 + (1 - prob)*lh2;

  ll=log(lh);

  dum=1;

  model dum ~ general(ll);

run;

So what I think (and certainly hope) this code does is:

1) It does every calculation per data set row (so subject in this case). This is what I suspect that nlmixed does in general although I couldn't really find any     information on how nlmixed works internally.

2) This means it calculates an expectation per subject, time point and mixture component combination inside the loop.

3a) This mean is put into the normal density together with the corresponding dependent variable measurement to give a subject-time-mixture component specific     likelihood contribution.

3b) These likelihood contributions are multiplied per mixture component which gives a subject-mixture component specific likelihood contribution.

4) After the loop is finished, the two subject-mixture component specific likelihood contributions are added under the mixture probability to give a subject specific likelihood contribution.

5) The log of this likelihood contribution is modeled using a dummy variable, since the whole likelihood has been specified already using all the dependent variable outcomes, so it is not dependent on that anymore.

6) nlmixed multiplies all the subject specific likelihood contributions in the background to give the total likelihood for one iteration.

I'm really hoping people have some insides or opinions about this, since I'm really uncertain if the code is correct. Especially the dummy part I'm not sure about, since I just took this from something I found on the internet; this paper uses it in the code:http://www.education.umd.edu/EDMS/fac/Harring/MIsc/WUSS-Paper.pdf.

Also, I'm having troubles figuring out how to get predictions from this model and how to include random subject effects. So thoughts on that are also more than welcome.

Magnus

SteveDenham
Jade | Level 19

Looking at the original code, it should work, except that mean2 is not defined in this example.  That may be simply due to a typo, but it would explain the results that you obtained.

Now as far as the last question about including random subject effects--the only way I have seen that done is with data in the long format, sorted by subject.  There are a couple of example in the documentation on NLMIXED that hit on this.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1537 views
  • 0 likes
  • 2 in conversation