Dear all,
I have a dataset from 23 subjects in a 2x2 crossover study. The data are as follows. The ID variable is the subject ID. The UID variable is the same as the ID variable, but concatenated with the SITE name (some subjects came from site ABC and some came from site DEF). When executing the proc mixed procedure with "ID" and with "UID", the standard errors and p-values are different between the two models. I'm using SAS 9.4. Could someone please help me to explain it?
Below are the codes:
data tmp;
length uid $12;
input id $ uid $ outcome sequence $ period arm;
datalines;
1002 ABC-1002 20 AB 1 1
1002 ABC-1002 30 AB 2 2
6002 ABC-6002 20 BA 1 2
6002 ABC-6002 11 BA 2 1
0001 ABC-0001 20 AB 1 1
0001 ABC-0001 31 AB 2 2
0002 ABC-0002 22 BA 1 2
0002 ABC-0002 22 BA 2 1
0005 ABC-0005 12 AB 1 1
0005 ABC-0005 41 AB 2 2
0008 ABC-0008 33 AB 1 1
0008 ABC-0008 17 AB 2 2
1006 DEF-1006 13 BA 1 2
1006 DEF-1006 12 BA 2 1
1007 DEF-1007 12 AB 1 1
1007 DEF-1007 30 AB 2 2
5011 DEF-5011 22 AB 1 1
5011 DEF-5011 40 AB 2 2
4001 DEF-4001 35 BA 1 2
4001 DEF-4001 27 BA 2 1
4004 DEF-4004 25 AB 1 1
4004 DEF-4004 31 AB 2 2
4006 DEF-4006 23 BA 1 2
4006 DEF-4006 52 BA 2 1
6005 DEF-6005 22 AB 1 1
6005 DEF-6005 33 AB 2 2
6006 DEF-6006 12 BA 1 2
6006 DEF-6006 12 BA 2 1
6007 DEF-6007 15 AB 1 1
6007 DEF-6007 13 AB 2 2
8001 DEF-8001 35 AB 1 1
8001 DEF-8001 23 AB 2 2
0003 DEF-0003 22 AB 1 1
0003 DEF-0003 32 AB 2 2
0004 DEF-0004 54 BA 1 2
0004 DEF-0004 23 BA 2 1
0006 DEF-0006 21 BA 1 2
0006 DEF-0006 32 BA 2 1
0009 DEF-0009 45 BA 1 2
0009 DEF-0009 24 BA 2 1
3002 DEF-3002 11 BA 1 2
3002 DEF-3002 21 BA 2 1
3003 DEF-3003 31 BA 1 2
3003 DEF-3003 22 BA 2 1
5001 DEF-5001 28 AB 1 1
5001 DEF-5001 25 AB 2 2
;
run;
%macro mix (id = );
proc sort data = tmp;
by &id;
run;
proc mixed data = tmp;
class &id sequence period arm;
model outcome = sequence period arm/ ddfm = kr;
random &id(sequence)/type = ar(1);
estimate 'treatment: 1 vs 2' arm 1 -1/cl;
run;
%mend;
%mix (id = id); *This gives a SE of treatment difference of 3.0158, p-value = .1372;
Estimates
Standard
Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper
treatment: 1 vs 2 -4.6515 3.0158 22 -1.54 0.1372 0.05 -10.9052 1.6022
%mix (id = uid); *This gives a SE of treatment difference of 2.9918, p-value= .1326;
Estimates
Standard
Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper
treatment: 1 vs 2 -4.6515 2.9918 24.9 -1.55 0.1326 0.05 -10.8146 1.5116
Edited 2017-10-20: Oops. It should be "repeated" instead of "random". And if you add the R option, the R matrix will be reported.
repeated period / subject= &id(sequence) type = ar(1) r;
Original response:
Use
random period / subject=&id(sequence) type = ar(1);
which will apply an AR(1) covariance structure to the 2x2 matrix defined by two levels of PERIOD, rather than the 23x23 matrix defined by the 23 levels of id(sequence) or uid(sequence). The 23 levels are not ordered the same for id(sequence) and uid(sequence) and so the covariance parameter estimates are different.
Edited 2017-10-20: Oops. It should be "repeated" instead of "random". And if you add the R option, the R matrix will be reported.
repeated period / subject= &id(sequence) type = ar(1) r;
Original response:
Use
random period / subject=&id(sequence) type = ar(1);
which will apply an AR(1) covariance structure to the 2x2 matrix defined by two levels of PERIOD, rather than the 23x23 matrix defined by the 23 levels of id(sequence) or uid(sequence). The 23 levels are not ordered the same for id(sequence) and uid(sequence) and so the covariance parameter estimates are different.
Yes: your original RANDOM statement is wrong. Don't use it.
Because each level of your ID variable matches one and only one level of your UID variable and vice versa (i.e., ID and UID are essentially identical factors), results using ID should not differ from results using UID.
The syntax of my suggested alternative is correct, although the AR(1) covariance structure may or may not provide the best fit to the data.
Oops. It should be "repeated" instead of "random". And if you add the R option, the R matrix will be reported.
repeated period / subject= &id(sequence) type = ar(1) r;
Thanks again.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.