Solved: Repeated measures in regression

Cornelis · Posted 10-01-2017 02:19 AM

Dear SAS users and experts,

I have received a mixture data set with repeat analysis results. The mixture contains 3 different ingredients where the sum of fraction ingredients is always equal to 1. On laboratory scale, every sample is measured 3 times on foam properties by the same person (so no subject variation).

The relation between foam and ingredients ratios can be done using PROC GLM with no intercept (NOINT).

Y1,1 = ax1 + bx2 + cx3

Y1,2 = ax1 + bx2 + cx3

Y1,3 = ax1 + bx2 + cx3

Y2,1 = ax1 + bx2 + cx3

Y2,2 = ax1 + bx2 + cx3

Y2,3 = ax1 + bx2 + cx3

Etc.

X1, x2, x3 are mass fraction ingredients.

Y1,1 … Y1,2 … Y1,3 are the response of mixture 1, measured 3 times.

Y2,1 … Y2,2 … Y2,3 are the response of mixture 2, measured 3 times.

And so on.

The problem is that we have very limited data set and the idea is to use proc glm with all measured Y values. Alternatively, I can use the mean values of Y for every mixtures but the drawback is that the power of the model becomes less, especially with limited data set. Using repeated measures of Y in the model will expand the data set 3 times and the model will improve a lot. But the possible drawback is that the model quality is too optimistic, derived from very high correlation coefficient. I have the impression that this kind of model is not very realistic.

My question: what is the most appropriate SAS PROC to handle in this kind of situation? PROC GLIMMIX with RANDOM (level is 1,2,3 for every mixture where 1 = measured first time, 2 = measured second time and 3 = measured third time)?

Thanks in advance,

Cornelis

PaigeMiller · Posted 10-02-2017 07:59 AM

@Cornelis wrote:

The problem is that we have very limited data set and the idea is to use proc glm with all measured Y values. Alternatively, I can use the mean values of Y for every mixtures but the drawback is that the power of the model becomes less, especially with limited data set.

Using the means is not only required by your design, but it is the only right thing to do.

Using repeated measures of Y in the model will expand the data set 3 times and the model will improve a lot.

But it is the wrong thing to do, and it confounds the replication variability with the experimental variability. In other words, the replicates tell you nothing about the experimental variability in your mixtures; they only tell you about the variability of repeat testing. You are going to have to live with the small amount of mixtures that you have.

My question: what is the most appropriate SAS PROC to handle in this kind of situation?

I would use PROC GLM, where the model is specified so that you can compute the experimental variability and the replicate variability — in other words, the replicates are nested within your mixtures and are random. This is equivalent to using the means for the model of mixture-to-mixture differences, and the replicates for the repeat testing variability. PROC GLIMMIX would also work in this situation.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 10-02-2017 07:59 AM

@Cornelis wrote:

The problem is that we have very limited data set and the idea is to use proc glm with all measured Y values. Alternatively, I can use the mean values of Y for every mixtures but the drawback is that the power of the model becomes less, especially with limited data set.

Using the means is not only required by your design, but it is the only right thing to do.

Using repeated measures of Y in the model will expand the data set 3 times and the model will improve a lot.

But it is the wrong thing to do, and it confounds the replication variability with the experimental variability. In other words, the replicates tell you nothing about the experimental variability in your mixtures; they only tell you about the variability of repeat testing. You are going to have to live with the small amount of mixtures that you have.

My question: what is the most appropriate SAS PROC to handle in this kind of situation?

I would use PROC GLM, where the model is specified so that you can compute the experimental variability and the replicate variability — in other words, the replicates are nested within your mixtures and are random. This is equivalent to using the means for the model of mixture-to-mixture differences, and the replicates for the repeat testing variability. PROC GLIMMIX would also work in this situation.

--
Paige Miller

Cornelis · Posted 10-02-2017 08:37 AM

Thank you for your kind support and advice.

You are right, confounding is the effect that will give false positive results.

Uisng repeat in teh random effect is the best option to build a model.

Best regards,

Repeated measures in regression

Re: Repeated measures in regression

Re: Repeated measures in regression

Re: Repeated measures in regression

Repeated measures in regression

Re: Repeated measures in regression

Re: Repeated measures in regression

Re: Repeated measures in regression

Ready to join fellow brilliant minds for the SAS Hackathon?