Fluorite | Level 6

## What does "GCV R-Square" from PROC ADAPTIVEREG mean?

Proc Adaptivereg produces multiple fit statistics (see screenshot below). One statistic is ‘GCV R-Square’.

1. What is the formal interpretation of the GCV R-Square statistic?
2. What is the formula for GCV R-Square?
3. Any references for the GCV R-Square statistic?

Thanks.

SAS Super FREQ

## Re: What does "GCV R-Square" from PROC ADAPTIVEREG mean?

That's a good question. I asked a colleague who knows more about this are than I do. The following information is summarized from our discussions:

1. The interpretation is as a goodness-of-fit statistic, similar to the concept of the R-square statistic in OLS regression.

2. The GCV R-Square statistics is defined as 1-GCV(final model)/GCV(null model). The definition has a similar form to the traditional R-Square statistic. The difference here is that a nonparametric technique is used to solve regression problems, so the traditional R-Square statistic does not apply.

3. The ADAPTIVEREG algorithm is based on work by Friedman, who called his algorithm "MARS". Friedman modified the GCV function for MARS from its original version (Craven and Wahba) by manually setting the number of degrees of freedom per spline basis to a fixed number. So this version of the GCV is a specific criterion that does not otherwise appear in the literature. The criterion was mentioned in page 27 of ‘Estimating Functions of Mixed Ordinal and Categorical Variables Using Adaptive Splines’ by Friedman (1991), and in Chapter 5 of the MARS User Guide. The original Annals of Statistics paper by Friedman does not formally call the statistic "GCV R-Square", but the examples in that paper use similar ideas to quantify the goodness-of-fit for models.

I hope this gives you an overview, as well as specific references if you need more information.

Discussion stats