- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm new to SAS, and I am a bit stuck.
Roughly here is the design:
60 participants each completed the same comprehension test twice (scored 0-6).
Once after reading each text in two different Formats (i.e., Single and Control).
When they read the different Formats, two topics were presented so participants didn't read the same topic twice. One was based on Animations and the other was based on People.
(Note: order was counterbalanced and is not a variable of interest).
I want to look at ME of each Format and Topic AND Format*Topic
Data is set up like this:
1 | 1 | 1 | 1 | 5 |
6 | 1 | 1 | 1 | 6 |
7 | 0 | 1 | 1 | 6 |
8 | 0 | 1 | 1 | 2 |
---
1 | 1 | 0 | 0 | 6 |
6 | 1 | 0 | 0 | 6 |
7 | 0 | 0 | 0 | 3 |
8 | 0 | 0 | 0 | 4 |
---
5 | 0 | 0 | 1 | 1 |
9 | 1 | 0 | 1 | 2 |
3 | 1 | 0 | 1 | 3 |
4 | 1 | 0 | 1 | 2 |
--
5 | 0 | 1 | 0 | 4 |
9 | 1 | 1 | 0 | 3 |
3 | 1 | 1 | 0 | 5 |
4 | 1 | 1 | 0 | 2 |
My code looks like this:
PROC MIXED DATA=x;
MODEL score = Ani1|Single1 / S DDFM=kr;
REPEATED / SUBJECT = ID TYPE=un; run;
However, I am not confident it is correct, since I get the SAME results (below) with and without the Repeated/Subject line in the code. Am I doing something wrong?
3.8824 | 0.2745 | 114 | 14.14 | <.0001 |
1.0776 | 0.4217 | 114 | 2.56 | 0.0119 |
-0.8424 | 0.4217 | 114 | -2.00 | 0.0482 |
0.1471 | 0.5964 | 114 | 0.25 | 0.8057 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Moved to Statistics community.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think the similarity is due to treating your independent factors as continuous variables (essentially a regression). As a result, the REPEATED statement doesn't really accomplish what you want to do. So, perhaps this would help:
PROC MIXED DATA=x;
CLASS Ani1 Single1 Order ID;
MODEL score = Ani1|Single1|Order / S DDFM=kr;
REPEATED Order/ SUBJECT = ID TYPE=un;
LSMEANS Ani1 Single1 Ani1*Single1;
run;
This assumes that order indexes the order in which a subject gets either Control or Single. There should be exactly one record for each ID for the combination of Ani1, Single1 and Order. Orderis left out of the calculation of the marginal means, so that those means are averages over the order. I included order and all of its interactions in the MODEL statement because it is a design element that should be accommodated, whether you are interested in the means by order or not. It may turn out that there is a significant order or order interaction effect, which may influence how you interpret the results.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you, @SteveDenham
The Class line makes sense. And I agree about Order.
I ran the code, and get this in my log file:
18 PROC MIXED DATA=xxx;
19 CLASS Ani1 Single1 Order1 ID;
20 MODEL score = Ani1|Single1|Order1 / S DDFM=kr;
21 REPEATED Order1/ SUBJECT = ID TYPE=un;
22 LSMEANS Ani1 Single1 Ani1*Single1;
23 run;
NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive definite
estimated R matrix for ID 21.
NOTE: PROCEDURE MIXED used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds
The output reads like it should, except it doesn't produce results.
The SAS System |
xxx |
Score |
Unstructured |
ID |
REML |
None |
Kenward-Roger |
Kenward-Roger |
2 | 0 1 |
2 | 0 1 |
2 | 0 1 |
59 | (blinded but accurate) |
3 |
27 |
0 |
59 |
2 |
118 |
118 |
0 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Also -- ID 21 is actually the first participant in the datafile, not one in the middle.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This error comes about when we don't specify the subject correctly, such that there is more than one observation with identical X values for a given subject. In this case, I suspect that there is a set where the IDs are duplicated. This would explain why the number of subjects=59 when you said that 60 were given the test. So the first thing I would do is use PROC FREQ to get a full cross-tab of your data, and check that there are the expected number entries for every combination of your X variables. Since 118 is not evenly divisible by 8 (2 x 2 x 2 design), something is likely missing/miscoded.
Then if the data all look correct, you could try changing the subject to subject=ID*single1. From the sample data, this looks like it may remove the duplicate issue.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@SteveDenham Thanks and thanks for helping me work through this.
First, apologies for the confusion about sample size. I rounded in the first post for simplicity sake. n = 59
The data appear to be coded correctly. Order was not evenly balanced. This is an analysis on a subsample of participants from a larger study, so while order was balanced across the full study, it is not within this sample of 59.
59 | 50.00 | 59 | 50.00 |
59 | 50.00 | 118 | 100.00 |
59 | 50.00 | 59 | 50.00 |
59 | 50.00 | 118 | 100.00 |
56 | 47.46 | 56 | 47.46 |
62 | 52.54 | 118 | 100.00 |
ID table Omitted, but all IDs are present with frequency of 2
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try subject=ID*Order1, as that looks like the only place pseudo-duplicates could show up. Otherwise, I think you will have to make the assumption that you mentioned before - that order has no effect, and remove it from the model. It may be as simple as adding a CLASS statement to your original PROC MIXED code.
SteveDenham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
However ...
1) I got this warning: Convergence criteria met but final Hessian is not positive definite.
2) Solution for Fixed Effects Table has many duplicate and blank rows.
Then, when I try to remove it altogether with this code, it wont run:
PROC MIXED DATA=x;
CLASS Ani1 Single1 ID;
MODEL score = Ani1|Single1 / S DDFM=kr;
REPEATED Order1/ SUBJECT = ID TYPE=un;
LSMEANS Ani1 Single1 Ani1*Single1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would now suggest using a different optimizer, which means moving over to PROC GLIMMIX.
PROC GLIMMIX DATA=x;
NLOPTIONS maxiter=5000 tech=nmsimp;
CLASS Ani1 Single1 Order ID;
MODEL score = Ani1|Single1|Order / S DDFM=kr;
RANDOM Order/ SUBJECT = ID*Ani1 TYPE=un residual;
LSMEANS Ani1 Single1 Ani1*Single1;
run;
See how this behaves.
SteveDenham
(I will be looking in occasionally over the holiday, but I am clocking out now.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are lots of blank rows in the Solutions for Fixed Effects table, but I believe that is by design, and that my interpretation should be based on the F-values in the Type III Tests of Fixed Effects Table.
I think that answers my questions for now. I want to keep this open while I dig through a bit more. Happy Thanksgiving. And thank you for your time!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
1) Solutions for Fixed effects.
2) Type III Tests of Fixed Effects.
3) LSM Tables.
Any resource would be welcome.
Also, do you have any guidance about how to identify which was more powerful (Ani1 vs Single1) and to report Cohen's D.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Starting at the top:
The solution vector is what is used to create the least squares means. I find it useful if I need an estimate and standard error of a continuous covariate. Otherwise, the latter two are more useful.
The Type3 F tests are testing to see if at least one mean is different from all the others in that effect (main or interaction). This is the primary test of "significance' for an effect.
The LSM (least squares means) tells you what the expected values are for each of the levels of the effects. Using the diffs option allows you to test if one particular mean is "significantly' different from another.
I don't know what you mean by more powerful. Do you mean which had a greater effect on the mean? That is generally what Cohen's D is all about. However, mixed models don't really lend themselves to calculating effect sizes. If you really want to look at something like it, add the /diff option to the lsmeans statement. The results table should present the t values for each comparison. This is a ratio of the difference to the standard error of the difference. Cohen's D is a ratio of the difference to the standard deviation of the reference group, so they should be analogous in direction.
SteveDenham