I have a stats question that seems basic but I can't track down the answer.
In a nutshell, I'm trying to determine if the mean compliance rate between three health status groups are different (health status is one variable with 3 levels). Originally I had run an ANOVA followed by pairwise comparisons. However, we later decided to stratify by plan for our sampled measures so I moved to a regression model (using SURVEYREG) that tests for specified linear effects (which allows me to weight, stratify, and still perform pairwise comparisons).
The question I have is that when I do pairwise with an ANOVA, the overall model (F-statistic) must be significant in order to proceed with the pairwise comparison. However, I'm not sure how this works with a regression model where I'm calculating linear estimates. Specifically I have one measure where the model effect is insignificant and the main effect is insignificant but when I look at the pairwise linear effects I find one pair is significant.
Here's the code:
(Note - cshcn_3c is a 3 level health status variable; NUM_WCC_BMI_3_17 is a 0/1 variable indicating if individual receeived service)
proc surveyreg data=cshcn_final total=CSHCN_TOTAL;
class cshcn_3c; model NUM_WCC_BMI_3_17=cshcn_3c;
estimate 'Minor v Healthy' cshcn_3c 0 1 -1;
estimate 'Chronic v Healthy' cshcn_3c 1 0 -1;
estimate 'Minor v Chronic' cshcn_3c 1 -1 0;
where denom_wcc_3_17=1;weight PW_WCC;strata planid;run;
Here's the relevant output from SAS:
The SURVEYREG Procedure
Regression Analysis for Dependent Variable NUM_WCC_BMI_3_17
Class Level Information
Variable Levels Values
cshcn_3c 3 Chronic Special needs (crg >= 5) Minor special needs (crg= 3 or 4) Not
special needs (crg <3)
NOTE: The denominator degrees of freedom for the F tests is 8278.
Analysis of Estimable Functions
Parameter Estimate Error t Value Pr > |t|
Minor v Healthy 0.01428557 0.02787915 0.51 0.6084
Chronic v Healthy 0.04030494 0.01874512 2.15 0.0316
Minor v Chronic 0.02601937 0.03141314 0.83 0.4075
NOTE: The denominator degrees of freedom for the t tests is 8278.
So what I'm trying to figure out is whether or not the significance of the linear estimate stands alone, or whether it must be "protected" by the significance on one of the "larger" effects (either the main effect of my CSHCN_3C variable and/or the model effect). At this point I'm leaning towards the linear estimates standing alone since I'm not interested in the overall predictive model here, just the pairwise comparisons. Also, when I calculate a (weighted/stratified) odds ratio for the chronic-healthy comparison it is significant and I would expect these results to match that; but maybe that's naive. Any thoughts?