Statistical Procedures

mariebee · Posted 11-20-2020 05:46 PM

I'm new to SAS, and I am a bit stuck.

Roughly here is the design:

60 participants each completed the same comprehension test twice (scored 0-6).

Once after reading each text in two different Formats (i.e., Single and Control).

When they read the different Formats, two topics were presented so participants didn't read the same topic twice. One was based on Animations and the other was based on People.

(Note: order was counterbalanced and is not a variable of interest).

I want to look at ME of each Format and Topic AND Format*Topic

Data is set up like this:

ID Order Ani1 Single1 Score

1	1	1	1	5
6	1	1	1	6
7	0	1	1	6
8	0	1	1	2

---

1	1	6
6	1	6
7	0	3
8	0	4

---

5	0	1	1
9	1	1	2
3	1	1	3
4	1	1	2

--

5	0	1	4
9	1	1	3
3	1	1	5
4	1	1	2

My code looks like this:

PROC MIXED DATA=x;

MODEL score = Ani1|Single1 / S DDFM=kr;

REPEATED / SUBJECT = ID TYPE=un; run;

However, I am not confident it is correct, since I get the SAME results (below) with and without the Repeated/Subject line in the code. Am I doing something wrong?

Solution for Fixed EffectsEffect Estimate StandardError DF t Value Pr > |t|InterceptAni1Single1Ani1*Single1

3.8824	0.2745	114	14.14	<.0001
1.0776	0.4217	114	2.56	0.0119
-0.8424	0.4217	114	-2.00	0.0482
0.1471	0.5964	114	0.25	0.8057

ChrisNZ · Posted 11-23-2020 07:27 PM

Moved to Statistics community.

High-Performance SAS Coding - Third Edition

SteveDenham · Posted 11-24-2020 09:11 AM

I think the similarity is due to treating your independent factors as continuous variables (essentially a regression). As a result, the REPEATED statement doesn't really accomplish what you want to do. So, perhaps this would help:

PROC MIXED DATA=x;
CLASS Ani1 Single1 Order ID;
MODEL score = Ani1|Single1|Order / S DDFM=kr;
REPEATED Order/ SUBJECT = ID TYPE=un;
LSMEANS Ani1 Single1 Ani1*Single1;
 run;

This assumes that order indexes the order in which a subject gets either Control or Single. There should be exactly one record for each ID for the combination of Ani1, Single1 and Order. Orderis left out of the calculation of the marginal means, so that those means are averages over the order. I included order and all of its interactions in the MODEL statement because it is a design element that should be accommodated, whether you are interested in the means by order or not. It may turn out that there is a significant order or order interaction effect, which may influence how you interpret the results.

SteveDenham

mariebee · Posted 11-24-2020 12:45 PM

Thank you, @SteveDenham

The Class line makes sense. And I agree about Order.

I ran the code, and get this in my log file:

18 PROC MIXED DATA=xxx;
19 CLASS Ani1 Single1 Order1 ID;
20 MODEL score = Ani1|Single1|Order1 / S DDFM=kr;
21 REPEATED Order1/ SUBJECT = ID TYPE=un;
22 LSMEANS Ani1 Single1 Ani1*Single1;
23 run;

NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive definite
estimated R matrix for ID 21.
NOTE: PROCEDURE MIXED used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds

The output reads like it should, except it doesn't produce results.

The SAS System

The Mixed Procedure

Model InformationData SetDependent VariableCovariance StructureSubject EffectEstimation MethodResidual Variance MethodFixed Effects SE MethodDegrees of Freedom Method

xxx

Score

Unstructured

ID

REML

None

Kenward-Roger

Class Level InformationClass Levels ValuesAni1Single1Order1ID

2	0 1
2	0 1
2	0 1
59	(blinded but accurate)

DimensionsCovariance ParametersColumns in XColumns in ZSubjectsMax Obs per Subject

3

27

0

59

2

Number of ObservationsNumber of Observations ReadNumber of Observations UsedNumber of Observations Not Used

118

0

mariebee · Posted 11-24-2020 12:47 PM

@SteveDenham

Also -- ID 21 is actually the first participant in the datafile, not one in the middle.

SteveDenham · Posted 11-24-2020 01:02 PM

This error comes about when we don't specify the subject correctly, such that there is more than one observation with identical X values for a given subject. In this case, I suspect that there is a set where the IDs are duplicated. This would explain why the number of subjects=59 when you said that 60 were given the test. So the first thing I would do is use PROC FREQ to get a full cross-tab of your data, and check that there are the expected number entries for every combination of your X variables. Since 118 is not evenly divisible by 8 (2 x 2 x 2 design), something is likely missing/miscoded.

Then if the data all look correct, you could try changing the subject to subject=ID*single1. From the sample data, this looks like it may remove the duplicate issue.

SteveDenham

mariebee · Posted 11-24-2020 01:23 PM

@SteveDenham Thanks and thanks for helping me work through this.

First, apologies for the confusion about sample size. I rounded in the first post for simplicity sake. n = 59

The data appear to be coded correctly. Order was not evenly balanced. This is an analysis on a subsample of participants from a larger study, so while order was balanced across the full study, it is not within this sample of 59.

The FREQ Procedure

Ani1Ani1 Frequency Percent CumulativeFrequency CumulativePercent01

59	50.00	59	50.00
59	50.00	118	100.00

Single1Single1 Frequency Percent CumulativeFrequency CumulativePercent01

59	50.00	59	50.00
59	50.00	118	100.00

Order1Order1 Frequency Percent CumulativeFrequency CumulativePercent01

56	47.46	56	47.46
62	52.54	118	100.00

ID table Omitted, but all IDs are present with frequency of 2

SteveDenham · Posted 11-24-2020 01:30 PM

Try subject=ID*Order1, as that looks like the only place pseudo-duplicates could show up. Otherwise, I think you will have to make the assumption that you mentioned before - that order has no effect, and remove it from the model. It may be as simple as adding a CLASS statement to your original PROC MIXED code.

SteveDenham

mariebee · Posted 11-24-2020 01:34 PM

ID*Order1 still wont run, but ID*Ani1 does (I tried prior to seeing your last reply).

However ...
1) I got this warning: Convergence criteria met but final Hessian is not positive definite.
2) Solution for Fixed Effects Table has many duplicate and blank rows.

Then, when I try to remove it altogether with this code, it wont run:

PROC MIXED DATA=x;
CLASS Ani1 Single1 ID;
MODEL score = Ani1|Single1 / S DDFM=kr;
REPEATED Order1/ SUBJECT = ID TYPE=un;
LSMEANS Ani1 Single1 Ani1*Single1;
run;

SteveDenham · Posted 11-24-2020 01:41 PM

I would now suggest using a different optimizer, which means moving over to PROC GLIMMIX.

PROC GLIMMIX DATA=x;
NLOPTIONS maxiter=5000 tech=nmsimp;
CLASS Ani1 Single1 Order ID;
MODEL score = Ani1|Single1|Order / S DDFM=kr;
RANDOM Order/ SUBJECT = ID*Ani1 TYPE=un residual;
LSMEANS Ani1 Single1 Ani1*Single1;
 run;

See how this behaves.

SteveDenham

(I will be looking in occasionally over the holiday, but I am clocking out now.)

mariebee · Posted 11-24-2020 01:54 PM

Thank you! This runs nicely.
There are lots of blank rows in the Solutions for Fixed Effects table, but I believe that is by design, and that my interpretation should be based on the F-values in the Type III Tests of Fixed Effects Table.

I think that answers my questions for now. I want to keep this open while I dig through a bit more. Happy Thanksgiving. And thank you for your time!

mariebee · Posted 11-24-2020 04:23 PM

If you can, I am having trouble understanding which output table to interpret? The three main methods yield different results:
1) Solutions for Fixed effects.
2) Type III Tests of Fixed Effects.
3) LSM Tables.
Any resource would be welcome.

Also, do you have any guidance about how to identify which was more powerful (Ani1 vs Single1) and to report Cohen's D.

SteveDenham · Posted 11-25-2020 09:04 AM

Starting at the top:

The solution vector is what is used to create the least squares means. I find it useful if I need an estimate and standard error of a continuous covariate. Otherwise, the latter two are more useful.

The Type3 F tests are testing to see if at least one mean is different from all the others in that effect (main or interaction). This is the primary test of "significance' for an effect.

The LSM (least squares means) tells you what the expected values are for each of the levels of the effects. Using the diffs option allows you to test if one particular mean is "significantly' different from another.

I don't know what you mean by more powerful. Do you mean which had a greater effect on the mean? That is generally what Cohen's D is all about. However, mixed models don't really lend themselves to calculating effect sizes. If you really want to look at something like it, add the /diff option to the lsmeans statement. The results table should present the t values for each comparison. This is a ratio of the difference to the standard error of the difference. Cohen's D is a ratio of the difference to the standard deviation of the reference group, so they should be analogous in direction.

SteveDenham

Statistical Procedures

PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Re: PROC MIXED Code for Repeated Measures

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...