Re: Proc GLM vs. Proc MIXED: how to specify error term

lotcarrots · Posted 11-19-2018 12:00 PM

Imagine a ring trial with 10 samples, each tested by a lab. There are 30 laboratories using one of 3 test kits, thus the head of a data set might look like attached.

Analysing the differences in the main effect of testkit can be done in proc glm via

PROC GLM DATA=dataset; 
	class sample test lab;
	model y = sample test lab(test) sample*test;
	test H=test E=lab(test);
  /* see if test is different, when assuming that testsets are specific to the lab (error term) */
	lsmeans sample;
	lsmeans test/ E=lab(test) tdiff pdiff;
RUN;

How can I check for an overall effect of the testkits AND an effect of testkit assuming another error structure in proc mixed?

At the moment I need two codes, first for the overall effect:

proc mixed data=dataset;
	class sample test lab;
	model y = sample test lab(test) sample*test;
run;

second for redefining the random component / error structure:

proc mixed data=dataset covtest;
	class sample test lab;
	model y = sample test sample*test / ddfm=BETWITHIN;
	random lab lab*test;
run;

In the latter I have to specify the ddfm, otherwise no DF can be calculated. For the covtest results, the test*lab interaction has an estimate of 0 (?), specifying nobound does not alter results. Writing method=type3 does not give any results.

Am I on the right path and can I combine the two outputs in proc mixed to see that there is a significant interaction for lab*test but no difference in testkits overall?

I am reading SAS FOR LINEAR MODELS (Littell), but it seems like I am missing something. Working in SAS 9.4. on Windows.

StatsMan · Posted 11-19-2018 12:18 PM

Are there any errors or warnings in the SAS log with the second PROC MIXED you used? Since lab 2 only did test B, that will make the analysis of lab and test more difficult

lotcarrots · Posted 11-20-2018 02:13 AM

Sorry, I didn't want to inflate the spreadsheet too much and just wrote down the head of the data as an example. Please see the full structure attached. These are 39 laboratories x 10 samples, i.e. 390 observations. From the labs, 4 use test A, 13 test B and 22 test C. y is continuous from approx. -10 to + 200.

Only note in the log is: Estimated G matrix is not positive definite. (for the second proc mixed.)

StatsMan · Posted 11-20-2018 07:42 AM

Thanks for the additional information. The message that the G matrix is not positive definite is an indication that there is a problem with the random effects you are trying to fit. Your RANDOM statement has LAB*TEST. However, each of your labs only saw one level of test. It will be difficult to estimate that interaction when your design did not account for the interaction effect. You would ideally need each lab to perform all 3 tests if you wanted to measure the LAB*TEST effect. The best you can do now is to drop the LAB*TEST effect from the RANDOM statement. Since you want to use DDFM=BW, I would also change your RANDOM statement to

random int / subject=lab;

so that MIXED knows how to break up the between- and within- subject effects. Without a SUBJECT= effect on a RANDOM statement, BW will assign all effects the residual df.

lotcarrots · Posted 11-20-2018 09:19 AM

Thank you very much for your quick reply. The latter does makes sense and the degrees of freedom are now displayed correctly.

Maybe I repeat myself, but is not the test just nested in the lab? Like the classroom in each school. Why is one random statement sufficient and not two:

random int / subject=lab;
random int / subject=test(lab);

Sure, again I would have the notes:
NOTE: Convergence criteria met but final Hessian is not positive definite.
NOTE: Estimated G matrix is not positive definite.
Just for my understanding, I don't see it yet... The test is the same but can be applied differently in each lab (which I expect). Example: Even if I have the same sample and it is examined in two labs using the same test kit, the measurements will differ (possibly).

Besides, the samples are considered independent, i.e. as 390 samples and not just 10?

StatsMan · Posted 11-21-2018 08:11 AM

Are the samples the same across the labs? For example, is sample 1 in lab 1 the same as sample 1 in lab 2? or are the samples just reps in your experiment? If they samples just represent replications within a lab (ie, sample 1 in lab 1 is not the same sample as sample 1 in lab 2) then you do not want sample as an effect in your model. TEST would be the only effect on the MODEL statement in that case.

If each lab sees a set of the tests, then you can model lab*test as a random effect. If not, then you can only model lab as random. The data just will not support a more complicated variance structure.

lotcarrots · Posted 11-21-2018 08:25 AM

The 10 samples are the same for every lab, i.e. sample 1 in lab 1 equals sample 1 in lab 2. Exactly that's my question: Personally, I would model test(lab) in a second random term, but according to your first answer, that's not possible?

Proc GLM vs. Proc MIXED: how to specify error term