Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Predicting Values of Principal Components obtained from PROC PRINCOMP

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-27-2017 12:43 PM
(2841 views)

Hi there! I hope you can help me out with my problem. This is for my project in school.

After conducting a survey, I performed Principal Component Analysis on the variables (survey questions) to reduce their count. I used PROC PRINCOMP to obtain the principal components. Before I can use the principal components I chose to retain in logistic regression, I need to predict their values first. I tried using PROC SCORE but somehow I could not make it work.

It worked with STATA when I used the command "predict <principal components>, score". I wonder if there is a similar statement or function in SAS to produce the same output?

Thank you very much!

These are my code so far for your reference. 😄

/*----------for PCA---------*/

PROC PRINCOMP DATA=Data OUT=prin;

VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;

RUN;

/*----this part does not work (for predicting)-----*/

PROC SCORE DATA=Data SCORE=print OUT=newdata PREDICT;

VAR prin1 prin2;

RUN;

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I think you need the OUTSTAT dataset not the OUT dataset?

See the example on the doc though it uses PROC FACTOR the methods should be similar.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Reeza! Thanks for your response!

I followed the example from the link you gave me. These are my code:

PROC FACTOR DATA=Data OUTSTAT=fact METHOD=PRIN EIGENVECTORS SCORE;

VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;

RUN;

PROC SCORE DATA=DATA SCORE=fact OUT=newdata PREDICT;

VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;

RUN;

However, what I got was the predicted values of the factors, not the principal components. Do you have any other inputs?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Using PROC PRINCOMP:

Principal component scores (is that what you are referring to?) are in the OUT= data set

Principal component loadings (vectors) are in the OUTSTAT= data set

Doing PCA to get scores so you can do a logistic regression is not something I would recommend because the PCA scores are computed *ignoring *the dependent variable and thus there is no reason to suspect that the PCA scores will be good predictors. I would recommend using Partial Least Squares (PROC PLS) regression in you case, the dimensions/scores used are determined by finding dimensions that are predictive of your Y variable(s), a property that PCA cannot claim. There is also a logistic version of PLS that has been developed, see https://cedric.cnam.fr/fichiers/RC906.pdf

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

We are probably using different terminology to describe the value you want. And so I do not understand what you mean by predicted values from PCA.

Principal components produces scores in each dimension (one for each observation), and it produces loadings (also called eigenvectors) in each dimension, one for each original variable.

So can you tell me in your own words what this predicted value is that you are looking for? Predicted value of WHAT? (For example, in an ordinary least squares regression, you can obtain the predicted values are for the y-variables; you can also obtain estimates of the slope and intercept; but since there are no y-variables in PCA, I am still not sure what is being predicted).

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

You have the principal components from PROC FACTOR.

You have the new data scored with the principal compenents from PROC SCORE.

Take the time to spell out EXACTLY what you want because whatever you're trying to accomplish is unclear.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Agreeing with @Reeza, we don't understand what you are trying to do, we don't understand what numbers you are trying to compute, we don't understand the phrase "predicted values" in the context of Principal Components analysis. Much more detail about what you want to do is critical here.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

In order to get predicted components you need to get eigenvector using proc princomp which will be used in score procedure. Please try this.

PROC PRINCOMP DATA=sashelp.heart OUTSTAT=eigenvector out=pc;

VAR ageatstart height weight diastolic systolic;

RUN;

/*----this part does not work (for predicting)-----*/

PROC SCORE DATA=sashelp.heart SCORE=eigenvector OUT=newdata PREDICT;

VAR ageatstart height weight diastolic systolic;

RUN;

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.