I'm new to SAS, and I am a bit stuck.
Roughly here is the design:
60 participants each completed the same comprehension test twice (scored 0-6).
Once after reading each text in two different Formats (i.e., Single and Control).
When they read the different Formats, two topics were presented so participants didn't read the same topic twice. One was based on Animations and the other was based on People.
(Note: order was counterbalanced and is not a variable of interest).
I want to look at ME of each Format and Topic AND Format*Topic
Data is set up like this:
1 | 1 | 1 | 1 | 5 |
6 | 1 | 1 | 1 | 6 |
7 | 0 | 1 | 1 | 6 |
8 | 0 | 1 | 1 | 2 |
---
1 | 1 | 0 | 0 | 6 |
6 | 1 | 0 | 0 | 6 |
7 | 0 | 0 | 0 | 3 |
8 | 0 | 0 | 0 | 4 |
---
5 | 0 | 0 | 1 | 1 |
9 | 1 | 0 | 1 | 2 |
3 | 1 | 0 | 1 | 3 |
4 | 1 | 0 | 1 | 2 |
--
5 | 0 | 1 | 0 | 4 |
9 | 1 | 1 | 0 | 3 |
3 | 1 | 1 | 0 | 5 |
4 | 1 | 1 | 0 | 2 |
My code looks like this:
PROC MIXED DATA=x;
MODEL score = Ani1|Single1 / S DDFM=kr;
REPEATED / SUBJECT = ID TYPE=un; run;
However, I am not confident it is correct, since I get the SAME results (below) with and without the Repeated/Subject line in the code. Am I doing something wrong?
3.8824 | 0.2745 | 114 | 14.14 | <.0001 |
1.0776 | 0.4217 | 114 | 2.56 | 0.0119 |
-0.8424 | 0.4217 | 114 | -2.00 | 0.0482 |
0.1471 | 0.5964 | 114 | 0.25 | 0.8057 |
Moved to Statistics community.
I think the similarity is due to treating your independent factors as continuous variables (essentially a regression). As a result, the REPEATED statement doesn't really accomplish what you want to do. So, perhaps this would help:
PROC MIXED DATA=x;
CLASS Ani1 Single1 Order ID;
MODEL score = Ani1|Single1|Order / S DDFM=kr;
REPEATED Order/ SUBJECT = ID TYPE=un;
LSMEANS Ani1 Single1 Ani1*Single1;
run;
This assumes that order indexes the order in which a subject gets either Control or Single. There should be exactly one record for each ID for the combination of Ani1, Single1 and Order. Orderis left out of the calculation of the marginal means, so that those means are averages over the order. I included order and all of its interactions in the MODEL statement because it is a design element that should be accommodated, whether you are interested in the means by order or not. It may turn out that there is a significant order or order interaction effect, which may influence how you interpret the results.
SteveDenham
Thank you, @SteveDenham
The Class line makes sense. And I agree about Order.
I ran the code, and get this in my log file:
18 PROC MIXED DATA=xxx;
19 CLASS Ani1 Single1 Order1 ID;
20 MODEL score = Ani1|Single1|Order1 / S DDFM=kr;
21 REPEATED Order1/ SUBJECT = ID TYPE=un;
22 LSMEANS Ani1 Single1 Ani1*Single1;
23 run;
NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive definite
estimated R matrix for ID 21.
NOTE: PROCEDURE MIXED used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
The output reads like it should, except it doesn't produce results.
The SAS System |
xxx |
Score |
Unstructured |
ID |
REML |
None |
Kenward-Roger |
Kenward-Roger |
2 | 0 1 |
2 | 0 1 |
2 | 0 1 |
59 | (blinded but accurate) |
3 |
27 |
0 |
59 |
2 |
118 |
118 |
0 |
Also -- ID 21 is actually the first participant in the datafile, not one in the middle.
This error comes about when we don't specify the subject correctly, such that there is more than one observation with identical X values for a given subject. In this case, I suspect that there is a set where the IDs are duplicated. This would explain why the number of subjects=59 when you said that 60 were given the test. So the first thing I would do is use PROC FREQ to get a full cross-tab of your data, and check that there are the expected number entries for every combination of your X variables. Since 118 is not evenly divisible by 8 (2 x 2 x 2 design), something is likely missing/miscoded.
Then if the data all look correct, you could try changing the subject to subject=ID*single1. From the sample data, this looks like it may remove the duplicate issue.
SteveDenham
@SteveDenham Thanks and thanks for helping me work through this.
First, apologies for the confusion about sample size. I rounded in the first post for simplicity sake. n = 59
The data appear to be coded correctly. Order was not evenly balanced. This is an analysis on a subsample of participants from a larger study, so while order was balanced across the full study, it is not within this sample of 59.
59 | 50.00 | 59 | 50.00 |
59 | 50.00 | 118 | 100.00 |
59 | 50.00 | 59 | 50.00 |
59 | 50.00 | 118 | 100.00 |
56 | 47.46 | 56 | 47.46 |
62 | 52.54 | 118 | 100.00 |
ID table Omitted, but all IDs are present with frequency of 2
Try subject=ID*Order1, as that looks like the only place pseudo-duplicates could show up. Otherwise, I think you will have to make the assumption that you mentioned before - that order has no effect, and remove it from the model. It may be as simple as adding a CLASS statement to your original PROC MIXED code.
SteveDenham
I would now suggest using a different optimizer, which means moving over to PROC GLIMMIX.
PROC GLIMMIX DATA=x;
NLOPTIONS maxiter=5000 tech=nmsimp;
CLASS Ani1 Single1 Order ID;
MODEL score = Ani1|Single1|Order / S DDFM=kr;
RANDOM Order/ SUBJECT = ID*Ani1 TYPE=un residual;
LSMEANS Ani1 Single1 Ani1*Single1;
run;
See how this behaves.
SteveDenham
(I will be looking in occasionally over the holiday, but I am clocking out now.)
Starting at the top:
The solution vector is what is used to create the least squares means. I find it useful if I need an estimate and standard error of a continuous covariate. Otherwise, the latter two are more useful.
The Type3 F tests are testing to see if at least one mean is different from all the others in that effect (main or interaction). This is the primary test of "significance' for an effect.
The LSM (least squares means) tells you what the expected values are for each of the levels of the effects. Using the diffs option allows you to test if one particular mean is "significantly' different from another.
I don't know what you mean by more powerful. Do you mean which had a greater effect on the mean? That is generally what Cohen's D is all about. However, mixed models don't really lend themselves to calculating effect sizes. If you really want to look at something like it, add the /diff option to the lsmeans statement. The results table should present the t values for each comparison. This is a ratio of the difference to the standard error of the difference. Cohen's D is a ratio of the difference to the standard deviation of the reference group, so they should be analogous in direction.
SteveDenham
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.