Contributor
Posts: 33

# Combining several highly correlated principal components

Hi Everyone,

It would be very grateful if someone can help me..

My data is organized in three groups (20 samples / group) and for each sample, i have measured two electrical characteristics (impedance and phase angle) through 100 different frequencies (picture in attachment).

Measurements within impedance for all 100 frequencies were strongly correlated (r = 0.99), and the same situation is within the measurements for the phase angle.

I was conducted PCA analysis (three times) in order to isolate the three intervals (low, medium and high frequencies) for each property.

Now I have six PC components (three for impedance and three for phase angle for specified intervals).

My goal is to determine which interval (for each property) serves as the best discriminator of groups, and then i want combine those (best) intervals into the one variable for regression analysis.

The problem is that the PC components are in a strong correlation (r=0,99) (for each characteristic), and i don't know to combine those PCs into one variable..

Tnx..

Posts: 3,061

## Re: Combining several highly correlated principal components

```My goal is to determine which interval (for each property) serves as the best discriminator of groups
```

Seems like PCA isn't the right method to determine discrimination between groups.

I would do a Discriminant Analysis. This has, by definition, greater capability to discriminate between groups than anything PCA will do. Discriminant analysis, by definition, is optimizing the linear combinations of your data to find the combinations that have maximum discriminating ability; whereas PCA is optimizing other criteria.

```The problem is that the PC components are in a strong correlation (r=0,99) (for each characteristic), and i don't know to combine those PCs into one variable..
I was conducted PCA analysis (three times) in order to isolate the three intervals (low, medium and high frequencies) for each property.
```

I find these two statements difficult to understand, as PCA does not isolate low, medium and high frequencies from your data; and components have to have zero correlation with one another (or do you mean that the first component of impedance is highly correlated with the first component of phase angle?

--
Paige Miller
Posts: 5,540

## Re: Combining several highly correlated principal components

Have you tried discriminant analyses? You might find the most useful frequencies for discriminating groups with STEPDISC.

PG

PG
Contributor
Posts: 33

## Re: Combining several highly correlated principal components

Hi,

I pre-defined frequency ranges at low, medium and high. PCA analysis was then performed in order to obtain lin. combination of impedance and phase angle for each interval (more important is to observe values at defined frequency range than at single frequency).

PC component (impedance) at frequencies in the range 1 is in strong correlation with the PC component (impedance) from the interval 2 and 3. Also, the same case is with the components of phase angle. There is also a relatively strong correlation between PC components of impedance and phase angle (mutully) because direct oblim. rotation (I want also investigate relationship between impedance and phase angle at different frequency range).

Now i want to use a discrim. analysis (first for all impedance components and then for phase angle comp.), but i have multicolinearity issue and standardized coff. are very large..

Posts: 3,061

## Re: Combining several highly correlated principal components

```Now i want to use a discrim. analysis (first for all impedance components and then for phase angle comp.), but i have multicolinearity issue and standardized coff. are very large..
```

As I said, you don't need PCA here, you can go straight to Discriminant Analysis. Using PCA as input to the Discriminant Analysis weakens the analysis.

--
Paige Miller
Posts: 5,540

## Re: Combining several highly correlated principal components

I agree with Paige that doing a discriminant analysis on principal components weakens the analysis. On the other hand, if you put all your highly correlated impedances and phases into the analysis, you will likely get spurious results that are impossible to interpret. Try this first: take the average impedance and phase for each frequency range (6 variables total) instead of principal components and use that as the basis for your discriminant analysis. The results should be easier to interpret that way. You can seek further improvement later with more sophisticated discrimination models.

PG

PG
Posts: 3,061

## Re: Combining several highly correlated principal components

There's also something called Partial Least Squares Discriminant Analysis (for example, see Partial least squares discriminant analysis: taking the magic away - ResearchGate), which probably is a reasonable thing to try on the entire data set, as it will likely handle the correlations better than anything else we have discussed, and might produce a somewhat interpretable model.

--
Paige Miller
Contributor
Posts: 33

## Re: Combining several highly correlated principal components

Thank you both for suggestions

I took the average value for each characteristic in a certain interval (instead of PC components) which was then used as predictors in the PLS-DA. (Before I conducted PLS-DA, I wanted to test differences in means by ANOVA).

From ANOVA, i got that partial eta ^ 2 values: i1 > i3 > i2

and VIP values from PLS-DA show the same trend: i1 > i3 > i2

Is this a coincidence that the largest percentage of variation in the independent variable (from ANOVA) is explained by variable that is the most important dikriminator in PLS-DA? (maybe this is a stupid question, but I'll ask anyway)

Tnx..

Posts: 3,061

## Re: Combining several highly correlated principal components

```I took the average value for each characteristic in a certain interval (instead of PC components) which was then used as predictors in the PLS-DA. (Before I conducted PLS-DA, I wanted to test differences in means by ANOVA).
```

Which isn't what the purpose of PLS-DA ... the proper use of PLS-DA would be to put all of your data into the analysis, without you pre-computing some statistics on the data

If you are going to take an average in a certain interval, you are back to the method which was recommended by @PGStats where you perform ordinary discriminant analysis and not PLS-DA.

Of course, this assumes you have chosen the proper intervals for this average, and I was under the impression that you wanted to use the analysis to find the intervals. Is this not correct? You did say "My goal is to determine which interval (for each property) serves as the best discriminator of groups"

--
Paige Miller
Contributor
Posts: 33

## Re: Combining several highly correlated principal components

If i put all the original 200 variables as predictors in the PLS-DA, i get very similar results. The problem that arises then is that i get a bunch of small significant intervals within these three (that i set before analysis).

(It is very difficult to draw a strict boundary between some frequencies and say that they belong to, for example, to area of low or moderately low frequencies. Some future research may be addressed to specific frequencies.

I used PLS-DA instead of the classic DA because these mean values in a very strong correlation (r = 0.99) and then again i have a situation like before (with PC comp.).

Because of this, i think the goal is achieved and research questions has been answered.

Now i will used this discrim. function(s) as predictor in polynomial regression..

Contributor
Posts: 33

## Re: Combining several highly correlated principal components

However, i took your advice and i used PLS-DA on all original data ...

it seems that some of the individual frequencies are "super" discriminators, which also makes them as good predictors in multiple linear regression...

my question is, does multicollinearity presents a "big" problem in creating a  regress. model that serves to "pure" prediction only?

Tnx..

Contributor
Posts: 33

## Re: Combining several highly correlated principal components

I forgot to mention that the (prediction) model residuals are multivariate normally distributed (p - values of Mardia skewness, kurtosis and Henze - Zirkler T test are high).

Durbin-Watson test value is 2,05, and Breusch – Pagan, and White test p-values are high.

Discussion stats
• 11 replies
• 725 views
• 1 like
• 3 in conversation