in proc reg how does sas determine its linear combinations when "model is not of full rank"?
This solution is closest to my thinking. Nevertheless, I have not yet been able to replicate the reported results. My suspicion is SAS uses more than just SWEEP to determine its results. Nevertheless, thanks to all who responded and offered suggestions. I already had tried other approaches, such as the suggested eigenfunction one.
See SAS documentation :
SAS® 9.4 and SAS® Viya® 3.5 Programming Documentation | SAS 9.4 / Viya 3.5
SAS/STAT User's Guide
The REG Procedure
Details --> Models of Less Than Full Rank
https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_reg_details23.htm
Koen
I had read the suggested material, and went back and reread it. My impression is that the multicollineary subsets come from either the GINV or SWEEP function available in IML. My simple test example is
in which I include r0 for the intercept term when using SWEEP. Setting r1 to Y and r2-r7 to X, PROC REG reports
because the model is not of full rank. I have been unable to replicate this output in IML.
When colinearities exist, Proc Reg sets certain parameter estimates to zero. Note that which parameter estimate is set to zero (ignored) depends on the order in which it enters the model.
When a matrix is not full rank, one or more of the eigenvalues of the X'X matrix are zero. So the corresponding eigenvectors determines the linear combinations reported.
PROC REG internally uses a SWEEP operator to compute the regression coefficients, X'X inverse, and the error sum of squares. PROC REG performs a sequential sweep. That is, it always begins by sweeping the second column of (y X)'(y X), that is the first column corresponding to the first X column. Then it sweeps the next column if the pivot is not zero (if the that column is not a linear function of the preceding column). Then it sweeps the next column if the pivot is not zero (if that column is not a linear combination of preceding columns). ... and so on ....
In contrast, a procedure like TRANSREG does nonsequential sweeps, finding the best column to sweep first, then the second best, and so on. The TRANSREG approach tends to be more accurate, but in linear models, you might prefer sequential sweeps to get Type I sums of squares.
This solution is closest to my thinking. Nevertheless, I have not yet been able to replicate the reported results. My suspicion is SAS uses more than just SWEEP to determine its results. Nevertheless, thanks to all who responded and offered suggestions. I already had tried other approaches, such as the suggested eigenfunction one.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.