from sas output I got two tables for collionarity diagnostics. Collinearity Diagnostics and Collinearity Diagnostics with intercept adjusted. Which table should I interpret for eigen values eigen vectors and condition index
could anyone help with that
Thanks
Typically, when you do something like Principal Components (which I think is what is happening here), you want the have the variables centered to have a mean of zero (and optionally have a variance of 1). However, the SAS documentation here doesn't really speak my language. It says: "If you specify the COLLINOINT option, the intercept variable is adjusted out first." but I don't really what "intercept variable is adjusted out" really means, they are words I cannot decipher. I think it might mean that it centers the original variables to have a mean of zero, but it doesn't say that and so I am not sure.
Similarly, it goes on to say "For each variable, PROC REG produces the proportion of the variance of the estimate accounted for by each principal component", but it doesn't say WHICH estimate.
However, there is good news. You can read all of the details at two different papers/books that are referenced. But the bad news is that I don't have those papers/books.
@Rick_SAS, can you shed some light on this?
Show us your code?
Typically, when you do something like Principal Components (which I think is what is happening here), you want the have the variables centered to have a mean of zero (and optionally have a variance of 1). However, the SAS documentation here doesn't really speak my language. It says: "If you specify the COLLINOINT option, the intercept variable is adjusted out first." but I don't really what "intercept variable is adjusted out" really means, they are words I cannot decipher. I think it might mean that it centers the original variables to have a mean of zero, but it doesn't say that and so I am not sure.
Similarly, it goes on to say "For each variable, PROC REG produces the proportion of the variance of the estimate accounted for by each principal component", but it doesn't say WHICH estimate.
However, there is good news. You can read all of the details at two different papers/books that are referenced. But the bad news is that I don't have those papers/books.
@Rick_SAS, can you shed some light on this?
Yes, that is my interpretation as well: "the intercept variable is adjusted out first" means "center the data."
I interpret the phrase "the proportion of the variance of the estimate accounted for by each principal component" to refer to using the principal components as the variables in a PC regression. It tells you what proportion of (estimate of) the total variance is accounted for by each PC. As you know, the PCs are orthogonal, so the total variance in the data can be decomposed in terms of the variances of the PCs. This estimate is given by the eigenvalues of the scaled (and perhaps centered) X`X matrix.
@Rick_SAS wrote:
Yes, that is my interpretation as well: "the intercept variable is adjusted out first" means "center the data."
I interpret the phrase "the proportion of the variance of the estimate accounted for by each principal component" to refer to using the principal components as the variables in a PC regression. It tells you what proportion of (estimate of) the total variance is accounted for by each PC. As you know, the PCs are orthogonal, so the total variance in the data can be decomposed in terms of the variances of the PCs. This estimate is given by the eigenvalues of the scaled (and perhaps centered) X`X matrix.
Okay, thanks. So my understanding is that I would use the COLLINOINT option rather than the COLLIN option, as I have never really discovered a meaningful use for principal components where the data was not centered. (Unless collinearity diagnostics is such a case, I will have to think about this)
In PCA, usually centered data is used, which is equivalent to using the correlation (scaled) or covariance (unscaled) matrix. There are applications (Jackson, 1991, A User's Guide to Principal Components, p. 72-74) in which the uncentered crossproduct matrix is used. Jackson cites "the field of chemistry" and gives an example for "the absorbance curves for ... samples measured at seven wavelengths."
THE PRINCOMP procedure in SAS provides the NOINT option for when you want to perform an "uncorrected" (=uncentered) analysis.
I haven't been following this discussion previously, but I think we should use the COLLIN option unless you want to ignore collinearities with the intercept. The parameter estimates are based on solving the (uncentered) normal equations (X`X)b = (X`*Y) for the estimates b. The precision of the estimates will be suspect if X`X is nearly rank deficient. Collinearities result in large standard errors and correlated estimates. The COLLIN option tells you when your data might be plagued by these issues, and I think you would want to know whether one of your explanatory variables is nearly constant.
Okay, good point, you do want to check for collinearity with the intercept, I wasn't thinking of that. Thanks.
If you for some reason want to fit a model that has no intercept, then you would use COLLINOINT.
Thank you for marking my answer correct, but you ought to un-mark it as correct. I have not answered the question, I have asked additional questions.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.