Re: Solving full rank issue

Alain38 · Posted 11-15-2017 05:16 AM

Dear all,

I noticed that to estimate betas in a regression (without intercept) subject to a full rank problem, SAS automatically sets a beta equals 0 to find least squares solutions .

This is perfectly understandable since X'X must be non-singular, which is not the case when there are full rank issues.

I was wondering how SAS choose which variable set equal to 0 as in my case, each variable is by construction a linear combination of other variabes.

I don't think SAS is doing this randomly since running proc reg several times provides me the same results.

Doing the computations manually with proc IML, I thought of two possibilities to determine which variable to omit:

- removing the variable which is the most correlated with others in average (i.e. the variable that has the most redundant information)

- for n variables, running the regression n times by removing each time a different variable, in order to finally keep the regression that exhibits the highest coefficient of multiple determination, which would mean that I actually removed the variable for which the coefficient of partial determination was the lowest (i.e. the variable that contributes the less to the explanation of the observed variations of Y)

Any assistance is greatly appreciated!

WarrenKuhfeld · Posted 11-15-2017 07:11 AM

Most procedures do sequential sweeps. https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_intromod_a0...

Search for "Goodnight sweep" for other sources. So with a three-level classification variable with three binary variables, it will sweep the first two and skip the last one. Transreg is a notable exception, it uses rational pivoting, which for really weird data can produce more accurate results. Orthoreg also uses a specialized method. See the doc.

Alain38 · Posted 11-15-2017 07:40 AM

Thank you for your help, I'm going to look into that.

For further details, I use proc reg or proc autoreg and these variables are not binary. The variables can be divided into two categories, and are calculated relatively within their category to induce stationarity, i.e. their sum for a category equals 1.

StatDave · Posted 11-15-2017 10:54 AM

Not sure if this is relevant or helpful to what you are doing, but see this note.

Rick_SAS · Posted 11-15-2017 10:26 AM

The reference level for the classification effects are set to zero. For a GLM parameterization, it is the last level. You can use the REF= option to specify the reference level. See the section "Parameterization of Model Effects" in the SAS/STAT documentation. For applications and interpretation of different parameterizations, see Pasta (2005).