12-20-2015 09:16 AM
I'm currently reading the book "Categorical Data Analysis Using SAS®" and there are two examples using PROC GLM that are not clear to me. One of it is in chapter 7.5 Analyzing Incomplete Data. Here the description of the example: "Table 7.3 displays artificial data collected for the purpose of determining if pH level alters action potential characteristics following administration of a drug (Harrell 1989). The response variable of interest (Vmax) was measured at up to four pH levels for each of 25 patients. While at least two measurements were obtained from each patient, only three patients provided data at all four pH levels.
The following PROC GLM statements produce the means for each pH level as well as the average
pairwise difference as the pH level increases.
model vmax = subject ph;
estimate 'direction' ph -1 0 0 1 / divisor=3;
What I don't understand: Why is the variable subject not in the class statement. Isn't now subject modeled as a linear effect as it isn't included in the class statment? Also the estimate statement is not clear to me. All pairwise comparisons should be 6 possible comparisons and I would expect the following code:
estimate 'direction' ph -3 -1 1 3 / divisor=6;
Does anyone understand the example and give me some advise? Thanks a lot in advance.
12-20-2015 04:31 PM
I agree with your first point.
In the second edition of the book, which I have in front of me (yours must be the 3rd ed.), the same example data are used at the end of chapter 6 ("Sets of s x r Tables", subsection 6.4.6). There, the authors only compute CMH statistics using PROC FREQ and do not apply PROC GLM.
So, now they have moved the example to chapter 7. Given its title, "Nonparametric Methods", I'm wondering why they apply PROC GLM to the original, untransformed data.
With SUBJECT being omitted from the CLASS statement, I think their code would be suitable for an ANCOVA model with the covariate SUBJECT. But SUBJECT is just a sequential number. Indeed, their results change if the subjects are numbered differently (e.g. the patient numbers 1 - 25 are randomly permuted). This can't be right.
The ESTIMATE statement seems less controversial to me. pH level has an interval scale. Therefore, if m1, m2, m3 and m4 are the parameter estimates corresponding to pH levels 6.5, 6.9, 7.4 and 7.9, it could make sense to calculate the "average pairwise difference as the pH level increases" as ((m4-m3)+(m3-m2)+(m2-m1))/3=(m4-m1)/3, which is what they do.
12-20-2015 08:11 PM
12-21-2015 12:08 PM
It is clearly a typo that subject was not put in the class statement.The rest of the code looks OK. You can use GLM if all the factors are fixed effects (it is fine for subject to be a fixed effect).