BookmarkSubscribeRSS Feed
Alain38
Quartz | Level 8

Dear all,

 

I noticed that to estimate betas in a regression (without intercept) subject to a full rank problem, SAS automatically sets a beta equals 0 to find least squares solutions .

 

This is perfectly understandable since X'X must be non-singular, which is not the case when there are full rank issues.

 

I was wondering how SAS choose which variable set equal to 0 as in my case, each variable is by construction a linear combination of other variabes.

 

I don't think SAS is doing this randomly since running proc reg several times provides me the same results.

 

Doing the computations manually with proc IML, I thought of two possibilities to determine which variable to omit:

  - removing the variable which is the most correlated with others in average (i.e. the variable that has the most redundant information)

  - for n variables, running the regression n times by removing each time a different variable, in order to finally keep the regression that exhibits the highest coefficient of multiple determination, which would mean that I actually removed the variable for which the coefficient of  partial determination was the lowest (i.e. the variable that contributes the less to the explanation of the observed variations of Y)

 

Any assistance is greatly appreciated!

4 REPLIES 4
WarrenKuhfeld
Rhodochrosite | Level 12

Most procedures do sequential sweeps.  https://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_intromod_a0...

Search for "Goodnight sweep" for other sources.  So with a three-level classification variable with three binary variables, it will sweep the first two and skip the last one.  Transreg is a notable exception, it uses rational pivoting, which for really weird data can produce more accurate results.  Orthoreg also uses a specialized method.  See the doc.

Alain38
Quartz | Level 8

Thank you for your help, I'm going to look into that.

 

For further details, I use proc reg or proc autoreg and these variables are not binary. The variables can be divided into two categories, and are calculated relatively within their category to induce stationarity, i.e. their sum for a category equals 1.

StatDave
SAS Super FREQ

Not sure if this is relevant or helpful to what you are doing, but see this note

Rick_SAS
SAS Super FREQ

The reference level for the classification effects are set to zero. For a GLM parameterization, it is the last level. You can use the REF= option to specify the reference level. See the section "Parameterization of Model Effects" in the SAS/STAT documentation. For applications and interpretation of different parameterizations, see Pasta (2005).

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1247 views
  • 0 likes
  • 4 in conversation