proc princomp

Occasional Contributor
Posts: 15

proc princomp

Hi all,

Is that true that when we use proc princomp, the variable prin1 in the output data must be from -1 to 1?

I used several survey items (H1DS5 H1DS8 H1DS1 H1DS2 H1DS10) with exactly the same scale and I used proc princomp. I got

SAS Output

Eigenvalues of the Correlation Matrix  Eigenvalue Difference Proportion Cumulative12
 2.05794 1.16466 0.4116 0.4116 0.893284 0.0784068 0.1787 0.5902

So I will use the first component since only the first is bigger than 1.

Then I open the output data and found the variable prin1--it is the component score that I should use as my latent factor, right?

But its distribution is not good.

SAS Output

 The SAS System

The UNIVARIATE Procedure
Variable: Prin1
Mean 0.00000 Std Deviation 1.43455
Median -0.77457 Variance 2.05794
Mode -0.77457 Range 13.72460

It ranged from -.8 to 11.3! I know the distribution is skewed but I can not log transform it because of the negative values.

In this case, if I am using it as my outcome variable in the model, can I use OLS?

this is my code:

proc princomp data=wave1.wave1 out=wave1.pcsat_cor;
var H1DS5 H1DS8 H1DS1 H1DS2 H1DS10;
run;

proc univariate data=wave1.pcsat_cor;
var prin1;
histogram prin1;
run;

Thank you!

Posts: 3,055

Re: proc princomp

[ Edited ]

@Lindy wrote:

Hi all,

Is that true that when we use proc princomp, the variable prin1 in the output data must be from -1 to 1?

No

So I will use the first component since only the first is bigger than 1.

There are other ways of determining how many components to use.

It ranged from -.8 to 11.3! I know the distribution is skewed but I can not log transform it because of the negative values.

The PRIN1 scores do not have to have any particular distribution. Transforming PRIN1 isn't something that is usually done. In fact, it may be that you have an outlier in PRIN1 (and possibly elsewhere). Have you plotted the distribution of PRIN1 to check for outliers? If there is a serious outlier, and you decide you should remove the outlier, then you would want to re-run the PCA analysis.

--
Paige Miller
Occasional Contributor
Posts: 15

Re: proc princomp

Thank you, Paige! I checked the frequency and distribution of prin1, and I found there is no particular outlier. The range of prin1 is -.77 to 12.95. The sample is with about 4000 cases and the majority of the respondents fall in -.77 to about 1 on prin1 (delinquency score), but there are some respondents evenly scored at some value from 1 to 12.95.

In this case, should I go ahead to use prin1 as my outcome variable in OLS?

Thank you!

--lindy

Posts: 3,055

Re: proc princomp

[ Edited ]

@Lindy wrote:

In this case, should I go ahead to use prin1 as my outcome variable in OLS?

Well now you have opened up a whole new issue. I am on record as opposing the use of PCA results as inputs to OLS, despite the fact that 90% of the rest of the world goes ahead and does this, ignoring the fatal flaw of using PCA as input to OLS.

Why? Because PCA does not include information about the Y variables when it determines the components and scores. You can easily get components that are not predictive of Y. This is a fatal flow of using PCA to predict Y variables. So there is really no reason to use PCA here if the ultimate goal is some prediction of a Y variable. You want to use a method of determining components that will produce components that are predictive of Y ... that method is called Partial Least Squares regression, which is PROC PLS in SAS. PLS will give you better predictions of Y than PCA ever will.

--
Paige Miller
Occasional Contributor
Posts: 15

Re: proc princomp

Thank you so much for your insights, Paige! I am not using PCA to predict Y.

My plan is like this.

I want to have a latent variable called "delinquency propensity" as Y in my model, and I have several independent variables such as parenting styles to the children, children's school scores, etc. Majority of the independent variables are scaled variables.

Because there is no item in my data called "delinquency propensity",  I used several items from the data asking the frequency of using a weapon, fighting, truancy, etc. Using proc princomp, I found these items are under 1 latent factor, so I want to use prin1 in the output as "delinquency propensity" --Y in my model. As I posted before, this prin1 ranged from -.77 to more than 12.

Based on the info, do you think OLS model is good option?

Thank you very much!

Posts: 3,055

Re: proc princomp

Okay, I got it backwards, you are using PCA to create Y, not predict Y.

I still don't see any reason to do this. You are creating PCA scores that may not be well predictable by your X variables. It simply doesn't make sense to do this. PLS does not have this drawback. PLS will find components of your variables that are well predictable by your X variables (predicted as well as the data will allow).

PCA simply doesn't help here.

--
Paige Miller
Occasional Contributor
Posts: 15