## Accounting for regression to the mean

Solved
Occasional Contributor
Posts: 10

# Accounting for regression to the mean

Apologies if this is duplicate question, but I can't find anything similar using search.

I have a data set containing a continuous variable, isotope GFR (which should remain stable) measured at time 0 and at 4 months.  There are three categorical outcome states assigned at four months (stable, improved, deteriorated).  I'm trying to consider differences in baseline variable between outcome groups.

Using scatter and Galton plots it appears there is a degree of regression to the mean in the measured variable.  My thought on accounting for this is to use ANCOVA as below.  However, my stats background is limited and I'd be very grateful for any comments as to the appropriateness of this method and / or advice on how to better handle this.

Many thanks

Jime

PROC GLM DATA=Date;

CLASS RESPONSE;

MODEL GFR_0 =RESPONSE (GFR_4-GFR_0)   ;

LSMEANS RESPONSE / ADJUST=TUKEY PDIFF ;

RUN;

Accepted Solutions
Solution
‎05-01-2013 09:01 AM
Posts: 2,655

## Re: Accounting for regression to the mean

Try changing the model statement to:

MODEL GFR_4=RESPONSE GFR_0;

This will give the four month value as adjusted for the time 0 value.

You should probably have a preliminary step, to check the homogeneity of slopes across the RESPONSE categories (see Milliken and Johnson's Analysis of Messy Data III: Analysis of Covariance).

So first fit:

MODEL GFR_4=RESPONSE GFR_0 GFR_0*RESPONSE;

and check the significance of the interaction term.  If it is non-significant, then the MODEL statement I gave at first is appropriate.  If it is significant, then the differences need to be calculated at a minimum of three time zero values (low, median, high) using multiple LSMEANS statements and the AT= option (check the documentation on how to do this).

Steve Denham

All Replies
Solution
‎05-01-2013 09:01 AM
Posts: 2,655

## Re: Accounting for regression to the mean

Try changing the model statement to:

MODEL GFR_4=RESPONSE GFR_0;

This will give the four month value as adjusted for the time 0 value.

You should probably have a preliminary step, to check the homogeneity of slopes across the RESPONSE categories (see Milliken and Johnson's Analysis of Messy Data III: Analysis of Covariance).

So first fit:

MODEL GFR_4=RESPONSE GFR_0 GFR_0*RESPONSE;

and check the significance of the interaction term.  If it is non-significant, then the MODEL statement I gave at first is appropriate.  If it is significant, then the differences need to be calculated at a minimum of three time zero values (low, median, high) using multiple LSMEANS statements and the AT= option (check the documentation on how to do this).

Steve Denham

Occasional Contributor
Posts: 10