Re: Appropriate Analysis for Score (Ordinal) Data - with example data

Laser_Taco_

Hello,

In my field there are often score measures that are taken as part of an experiment, and the appropriateness of different analyses are invariably called into question at conferences or peer review.
The score data is typically coded as 0, 1, 2, 3 or 0, 1, 2, 3, 4 or some variation. The scores usually mean "absent" > "Severe." The criticism for different analysis boil down to whether or not a nonparametric analysis was done or if the person just included these scores as continuous variables and ran their typical ANOVA.

I have attached an example dataset to aid in the discussion.

Example 1: One-way ANOVA, Randomized Complete Block Design

The "typical ANOVA" - for One-way parametric

PROC GLM data=data;

CLASS trt rep;

model score = trt rep;

means trt / LSD ;

lsmeans trt/ pdiff stderr lines;

run;

quit;

-- Here the usual step is for someone to put the lsmeans in a table and add superscripts

The "typical ANOVA" - for One-way non-parametric

PROC NPAR1WAY data=data WILCOXON DSCF ;
CLASS trt;
var score;
run;
quit;

-- Here the usual step is for someone to put the frequency of each score (0, 1, 2, 3 -- or whatever scale) for a given treatment into a table and add superscripts based on the Pairwise MC output.

Example 2: 2 x 2 Factorial in a Randomized Complete Block Design

-- Here there is even more confusion about what would be considered the most appropriate analysis. Typically I will see people ignore the whole factorial setup in general and assign each factorial combination as a "trt" and analyze the data in much the same way as the two codes above.

I'm really looking for a better way to analyze these types of data in an appropriate -- and meaningful way.

StatDave

From what you describe, your response variable is ordinal. Since it is integer valued, it is probably not reasonable to assume normality and use GLM. You can use PROC LOGISTIC to model an ordinal multinomial response like yours. Using the data you provided, these statements fit the model allowing for interaction in the factors. It simultaneously models the cumulative probabilities - Pr(score=3), Pr(score=2 or 3), Pr(score=1,2, or 3). This ordering and grouping of level is the result of the DESC response variable option. The ILINK option in the LSMEANS statement provides the estimated cumulative probabilities for each A-B combination (in the Mean column). In the results you'll see that there is a significant interaction. As with any model, the interaction means that the effect of one factor changes with the level of the other factor. So, it is necessary to assess the effect of each factor at each level of the other factor. The effect of REP is not significant and could possibly be dropped from the model but is left in for the following.

proc logistic data=x;
class factora factorb / param=glm;
model score(desc)=factora|factorb rep;
lsmeans factora*factorb/ilink plots=none;
store mod2;
run;

You can visually plot the interaction effect using the fitted and stored model (in MOD2) and then using the EFFECTPLOT statement in PROC PLM to produce the plot. The following statements show the interaction two different but equivalent ways. Note that there is a plot for each of the three cumulative probabilities.

proc plm source=mod2;
effectplot interaction(x=factorb sliceby=factora);
effectplot interaction(x=factora sliceby=factorb);
run;

To assess the interaction, it is reasonable to use the approach you mentioned which is to include just the 4 level TRT variable in the model and then make suitable comparisons among the factor combinations. That is done by the following, but a better labeled variable, AB, is used instead of TRT.

data x; set x;
ab=cats(factora,factorb);
run;
proc logistic data=x;
class ab / param=glm;
model score(desc)=ab rep;
lsmeans ab / ilink plots=none;
store mod;
run;

Note that the model is equivalent to the first one - same likelihood value, same estimated cumulative probabilities. Using the order of the factor combinations shown in the LSMEANS table, you can use LSMESTIMATE statements to easily test the effects within the interaction as described above.

proc plm source=mod;
effectplot interaction(x=ab sliceby=score);
lsmestimate ab 'A1-A2 in Yes' 0 1 0 -1 / joint e;
lsmestimate ab 'A1-A2 in No'  1 0 -1 0 / joint e;
lsmestimate ab 'diff in diff' -1 1 1 -1 / joint e; *A1-A2 in Yes - A1-A2 in No;
run;

The EFFECTPLOT statement shows the cumulative probabilities for each of the 4 factor combinations. The first two LSMESTIMATE statements test the effect of A in each level of B. The last LSMESTIMATE statement tests the so-called "difference in difference" effect which is the difference of the two differences in the first two statements. This is, in fact, the interaction effect and you'll see that it has the same p-value as from the first model fit. The results form the first two statements show that the effect of A is significant in B=Yes but not in B=No. You can easily see that best in the plot, but also in the interaction plots earlier.

Laser_Taco_

Hey there,

Thank you for the quick response.

Just to clarify, this data is mocked up as an example. I was curious about two different cases -- one for a true One-way ANOVA type experiment and also for another true 2x2 factorial experiment. I put it all in the same file just out of convenience.

But if I am understanding you correctly - it actually doesn't matter so much whether or not it was a "true" One-way ANOVA or a "true" 2x2 factorial. In both cases I would run the data in the same manner?

To follow up with this, would you be able to provide a heuristic argument for why this would be the case? Or point me to a reference?

One other thing to point out, is that using the model you propose I notice that the Score Test for the Proportional Odds Assumption was significant and would be violated. In such a case, I guess I wouldn't be able to to interpret effects of the predictors across the score thresholds, since the slope parameters differ depending on whether i model the odds of being above or below a given score.

I wouldn't really know what to do in that situation -- just interpret each combination of factors for each of the score thresholds?

StatDave

As I showed, the model using TRT (or my AB variable which is the same thing), which is sometimes called the "mean model," is equivalent to the model with A, B, and A*B. The former simply combines the degrees of freedom (DF) into a single effect - there is 1 DF each for A and B and also 1 for the interaction, AB. TRT (or AB) just combine those 3 DF into a single effect. The proof that the models are equivalent, just reparameterized, is the identical likelihood values. The mean model is just more convenient for making comparisons among the 4 combinations (the same could be done with the A B AB model - it's just less obvious how and really messes people up).

Yes, for these data, the proportional odds test is significant. This test is known to be liberal, meaning that it shows significance more often than it should. But you can always investigate that assumption using the graphical and analytical methods described in this note . If you decide that the assumption really is not supported, then you can fit a fully or partially nonproportional odds model as shown in the note. Alternatively, you could ignore the ordinality of the response and fit a nominal multinomial model by adding the LINK=GLOGIT option in the MODEL statement. The nonproportional odds models or the nominal models just have more parameters to estimate, but the same general approach to your problem can be done.

Laser_Taco_

Yes, I understand that the models are equivalent -- given that the second option you showed is just a reparameterization with the AB factors. However, my whole purpose for this question was to discuss the best approach for analysis would be under different scenarios.

You've given the explanation for the factorial case (Example 2 from my original post), but I'm also interested in the non-factorial case. So using the mock dataset (pretend that Factor A and B aren't there, and don't exist).

Under the scenario that the data is in reality -- 4 ind. treatments (Example 1 from my original post; no factors -- just 4 genuine trt) would it still be the best or ideal way to analyze the data -- under the proportional odds model?

Another consideration I should have brought up earlier was the interpretability of the results. In the scenario I was imagining when writing this post -- the end take away would ideally be describing the proportion of each score that occurs for each treatment (Ex 1) or each factor/combination (Ex 2). Being able to say something to the effect of:
Example 1: For treatment 1, x % of samples received score 1, x % of samples received score 2 with a similar occurrence (x %) as score 3.
or
Example 2: For Factor A, x % of samples received score 1 with a similar occurrence for score 2 (x %), while x % of samples received score 3.
(Full transparency, I haven't had a chance to look at the link provided by the other commenter yet. So if I am able to interpret the results of this analysis in this way feel free to ignore the comment about interpretability.)

StatDave

I used the factorial example since it is the more complex scenario. The case of 4 distinct treatments is a simpler subcase where the model has only the treatment effect, just like the second model with AB. So, yes, that is still the model you could use for your ordinal response in the case of having 4 distinct, nonfactorial treatments. But since you want to estimate the proportions in response levels, it is simplest to treat the response as nominal rather than ordinal by using the generalized logit model (LINK=GLOGIT option). But the GLOGIT model (like the full non-proportional odds model) requires the estimation of many additional model parameters and this can cause estimation problems when the data are sparse, like these data (note the zero counts in the FREQ table), causing "separation" errors due to some model parameters trying to be infinite as happens in this case.

Nonetheless, suppose that the AB combinations are nonfactorial treatments. The following refits the AB model from before but using the GLOGIT model. The LSMEANS statement now produces the estimates of the individual response levels for each treatment. Despite the separation issues, you'll notice that the estimated proportions are mostly quite close to the observed proportions. So, for purposes of prediction, this works well but I would caution against trusting any tests since many parameters are very unstable as evidenced by the quite large standard errors.

proc logistic data=x;
class ab / param=glm;
model score(desc)=ab rep / link=glogit;
lsmeans ab / ilink plots=none;
run;
proc freq data=x; 
table ab*score / nocol nopercent; 
run;

Ksharp

https://blogs.sas.com/content/iml/2025/10/27/viz-cumulative-logistic.html

Appropriate Analysis for Score (Ordinal) Data - with example data