turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Combining several highly correlated principal comp...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2015 12:22 PM

Hi Everyone,

It would be very grateful if someone can help me..

My data is organized in three groups (20 samples / group) and for each sample, i have measured two electrical characteristics (impedance and phase angle) through 100 different frequencies (picture in attachment).

Measurements within impedance for all 100 frequencies were strongly correlated (r = 0.99), and the same situation is within the measurements for the phase angle.

I was conducted PCA analysis (three times) in order to isolate the three intervals (low, medium and high frequencies) for each property.

Now I have six PC components (three for impedance and three for phase angle for specified intervals).

My goal is to determine which interval (for each property) serves as the best discriminator of groups, and then i want combine those (best) intervals into the one variable for regression analysis.

The problem is that the PC components are in a strong correlation (r=0,99) (for each characteristic), and i don't know to combine those PCs into one variable..

Tnx..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2015 01:04 PM

My goal is to determine which interval (for each property) serves as the best discriminator of groups

Seems like PCA isn't the right method to determine discrimination between groups.

I would do a Discriminant Analysis. This has, by definition, greater capability to discriminate between groups than anything PCA will do. Discriminant analysis, by definition, is optimizing the linear combinations of your data to find the combinations that have maximum discriminating ability; whereas PCA is optimizing other criteria.

The problem is that the PC components are in a strong correlation (r=0,99) (for each characteristic), and i don't know to combine those PCs into one variable..

I was conducted PCA analysis (three times) in order to isolate the three intervals (low, medium and high frequencies) for each property.

I find these two statements difficult to understand, as PCA does not isolate low, medium and high frequencies from your data; and components have to have zero correlation with one another (or do you mean that the first component of impedance is highly correlated with the first component of phase angle?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2015 01:16 PM

Have you tried discriminant analyses? You might find the most useful frequencies for discriminating groups with STEPDISC.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2015 02:07 PM

Hi,

I pre-defined frequency ranges at low, medium and high. PCA analysis was then performed in order to obtain lin. combination of impedance and phase angle for each interval (more important is to observe values at defined frequency range than at single frequency).

PC component (impedance) at frequencies in the range 1 is in strong correlation with the PC component (impedance) from the interval 2 and 3. Also, the same case is with the components of phase angle. There is also a relatively strong correlation between PC components of impedance and phase angle (mutully) because direct oblim. rotation (I want also investigate relationship between impedance and phase angle at different frequency range).

Now i want to use a discrim. analysis (first for all impedance components and then for phase angle comp.), but i have multicolinearity issue and standardized coff. are very large..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2015 02:23 PM

Now i want to use a discrim. analysis (first for all impedance components and then for phase angle comp.), but i have multicolinearity issue and standardized coff. are very large..

As I said, you don't need PCA here, you can go straight to Discriminant Analysis. Using PCA as input to the Discriminant Analysis weakens the analysis.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2015 03:22 PM

I agree with Paige that doing a discriminant analysis on principal components weakens the analysis. On the other hand, if you put all your highly correlated impedances and phases into the analysis, you will likely get spurious results that are impossible to interpret. Try this first: take the average impedance and phase for each frequency range (6 variables total) instead of principal components and use that as the basis for your discriminant analysis. The results should be easier to interpret that way. You can seek further improvement later with more sophisticated discrimination models.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-24-2015 03:55 PM

There's also something called Partial Least Squares Discriminant Analysis (for example, see Partial least squares discriminant analysis: taking the magic away - ResearchGate), which probably is a reasonable thing to try on the entire data set, as it will likely handle the correlations better than anything else we have discussed, and might produce a somewhat interpretable model.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2015 09:07 AM

Thank you both for suggestions

I took the average value for each characteristic in a certain interval (instead of PC components) which was then used as predictors in the PLS-DA. (Before I conducted PLS-DA, I wanted to test differences in means by ANOVA).

From ANOVA, i got that partial eta ^ 2 values: i1 > i3 > i2

and VIP values from PLS-DA show the same trend: i1 > i3 > i2

Is this a coincidence that the largest percentage of variation in the independent variable (from ANOVA) is explained by variable that is the most important dikriminator in PLS-DA? (maybe this is a stupid question, but I'll ask anyway)

Tnx..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2015 09:33 AM

I took the average value for each characteristic in a certain interval (instead of PC components) which was then used as predictors in the PLS-DA. (Before I conducted PLS-DA, I wanted to test differences in means by ANOVA).

Which isn't what the purpose of PLS-DA ... the proper use of PLS-DA would be to put all of your data into the analysis, without you pre-computing some statistics on the data

If you are going to take an average in a certain interval, you are back to the method which was recommended by @PGStats where you perform ordinary discriminant analysis and not PLS-DA.

Of course, this assumes you have chosen the proper intervals for this average, and I was under the impression that you wanted to use the analysis to find the intervals. Is this not correct? You did say "My goal is to determine which interval (for each property) serves as the best discriminator of groups"

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2015 10:28 AM

If i put all the original 200 variables as predictors in the PLS-DA, i get very similar results. The problem that arises then is that i get a bunch of small significant intervals within these three (that i set before analysis).

(It is very difficult to draw a strict boundary between some frequencies and say that they belong to, for example, to area of low or moderately low frequencies. Some future research may be addressed to specific frequencies.

I used PLS-DA instead of the classic DA because these mean values in a very strong correlation (r = 0.99) and then again i have a situation like before (with PC comp.).

Because of this, i think the goal is achieved and research questions has been answered.

Now i will used this discrim. function(s) as predictor in polynomial regression..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2015 07:01 PM

However, i took your advice and i used PLS-DA on all original data ...

it seems that some of the individual frequencies are "super" discriminators, which also makes them as good predictors in multiple linear regression...

my question is, does multicollinearity presents a "big" problem in creating a regress. model that serves to "pure" prediction only?

Tnx..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-25-2015 07:20 PM

I forgot to mention that the (prediction) model residuals are multivariate normally distributed (p - values of Mardia skewness, kurtosis and Henze - Zirkler T test are high).

Durbin-Watson test value is 2,05, and Breusch – Pagan, and White test p-values are high.