Solved: Re: collionarity diagonistics

shahd · Posted 11-27-2018 03:11 PM

from sas output I got two tables for collionarity diagnostics. Collinearity Diagnostics and Collinearity Diagnostics with intercept adjusted. Which table should I interpret for eigen values eigen vectors and condition index

could anyone help with that

Thanks

PaigeMiller · Posted 11-27-2018 04:29 PM

Typically, when you do something like Principal Components (which I think is what is happening here), you want the have the variables centered to have a mean of zero (and optionally have a variance of 1). However, the SAS documentation here doesn't really speak my language. It says: "If you specify the COLLINOINT option, the intercept variable is adjusted out first." but I don't really what "intercept variable is adjusted out" really means, they are words I cannot decipher. I think it might mean that it centers the original variables to have a mean of zero, but it doesn't say that and so I am not sure.

Similarly, it goes on to say "For each variable, PROC REG produces the proportion of the variance of the estimate accounted for by each principal component", but it doesn't say WHICH estimate.

However, there is good news. You can read all of the details at two different papers/books that are referenced. But the bad news is that I don't have those papers/books.

@Rick_SAS, can you shed some light on this?

--
Paige Miller

View solution in original post

PeterClemmensen · Posted 11-27-2018 04:12 PM

Show us your code?

The DATA to DATA Step Macro
Blog: SASnrd

PaigeMiller · Posted 11-27-2018 04:29 PM

Typically, when you do something like Principal Components (which I think is what is happening here), you want the have the variables centered to have a mean of zero (and optionally have a variance of 1). However, the SAS documentation here doesn't really speak my language. It says: "If you specify the COLLINOINT option, the intercept variable is adjusted out first." but I don't really what "intercept variable is adjusted out" really means, they are words I cannot decipher. I think it might mean that it centers the original variables to have a mean of zero, but it doesn't say that and so I am not sure.

Similarly, it goes on to say "For each variable, PROC REG produces the proportion of the variance of the estimate accounted for by each principal component", but it doesn't say WHICH estimate.

However, there is good news. You can read all of the details at two different papers/books that are referenced. But the bad news is that I don't have those papers/books.

@Rick_SAS, can you shed some light on this?

--
Paige Miller

Rick_SAS · Posted 11-28-2018 08:56 AM

Yes, that is my interpretation as well: "the intercept variable is adjusted out first" means "center the data."

I interpret the phrase "the proportion of the variance of the estimate accounted for by each principal component" to refer to using the principal components as the variables in a PC regression. It tells you what proportion of (estimate of) the total variance is accounted for by each PC. As you know, the PCs are orthogonal, so the total variance in the data can be decomposed in terms of the variances of the PCs. This estimate is given by the eigenvalues of the scaled (and perhaps centered) X`X matrix.

PaigeMiller · Posted 11-28-2018 09:08 AM

@Rick_SAS wrote:

Yes, that is my interpretation as well: "the intercept variable is adjusted out first" means "center the data."

I interpret the phrase "the proportion of the variance of the estimate accounted for by each principal component" to refer to using the principal components as the variables in a PC regression. It tells you what proportion of (estimate of) the total variance is accounted for by each PC. As you know, the PCs are orthogonal, so the total variance in the data can be decomposed in terms of the variances of the PCs. This estimate is given by the eigenvalues of the scaled (and perhaps centered) X`X matrix.

Okay, thanks. So my understanding is that I would use the COLLINOINT option rather than the COLLIN option, as I have never really discovered a meaningful use for principal components where the data was not centered. (Unless collinearity diagnostics is such a case, I will have to think about this)

--
Paige Miller

Rick_SAS · Posted 11-28-2018 09:42 AM

In PCA, usually centered data is used, which is equivalent to using the correlation (scaled) or covariance (unscaled) matrix. There are applications (Jackson, 1991, A User's Guide to Principal Components, p. 72-74) in which the uncentered crossproduct matrix is used. Jackson cites "the field of chemistry" and gives an example for "the absorbance curves for ... samples measured at seven wavelengths."

THE PRINCOMP procedure in SAS provides the NOINT option for when you want to perform an "uncorrected" (=uncentered) analysis.

I haven't been following this discussion previously, but I think we should use the COLLIN option unless you want to ignore collinearities with the intercept. The parameter estimates are based on solving the (uncentered) normal equations (X`X)b = (X`*Y) for the estimates b. The precision of the estimates will be suspect if X`X is nearly rank deficient. Collinearities result in large standard errors and correlated estimates. The COLLIN option tells you when your data might be plagued by these issues, and I think you would want to know whether one of your explanatory variables is nearly constant.

PaigeMiller · Posted 11-28-2018 09:50 AM

Okay, good point, you do want to check for collinearity with the intercept, I wasn't thinking of that. Thanks.

If you for some reason want to fit a model that has no intercept, then you would use COLLINOINT.

--
Paige Miller

PaigeMiller · Posted 11-28-2018 08:19 AM

@shahd

Thank you for marking my answer correct, but you ought to un-mark it as correct. I have not answered the question, I have asked additional questions.

--
Paige Miller

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away