turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Predicting Values of Principal Components obtained...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-27-2017 12:43 PM

Hi there! I hope you can help me out with my problem. This is for my project in school.

After conducting a survey, I performed Principal Component Analysis on the variables (survey questions) to reduce their count. I used PROC PRINCOMP to obtain the principal components. Before I can use the principal components I chose to retain in logistic regression, I need to predict their values first. I tried using PROC SCORE but somehow I could not make it work.

It worked with STATA when I used the command "predict <principal components>, score". I wonder if there is a similar statement or function in SAS to produce the same output?

Thank you very much!

These are my code so far for your reference.

/*----------for PCA---------*/

PROC PRINCOMP DATA=Data OUT=prin;

VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;

RUN;

/*----this part does not work (for predicting)-----*/

PROC SCORE DATA=Data SCORE=print OUT=newdata PREDICT;

VAR prin1 prin2;

RUN;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to janimagus

05-27-2017 12:51 PM

I think you need the OUTSTAT dataset not the OUT dataset?

See the example on the doc though it uses PROC FACTOR the methods should be similar.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-27-2017 01:20 PM

Hi Reeza! Thanks for your response!

I followed the example from the link you gave me. These are my code:

PROC FACTOR DATA=Data OUTSTAT=fact METHOD=PRIN EIGENVECTORS SCORE;

VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;

RUN;

PROC SCORE DATA=DATA SCORE=fact OUT=newdata PREDICT;

VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;

RUN;

However, what I got was the predicted values of the factors, not the principal components. Do you have any other inputs?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to janimagus

05-27-2017 01:26 PM - edited 05-28-2017 07:26 AM

Using PROC PRINCOMP:

Principal component scores (is that what you are referring to?) are in the OUT= data set

Principal component loadings (vectors) are in the OUTSTAT= data set

Doing PCA to get scores so you can do a logistic regression is not something I would recommend because the PCA scores are computed *ignoring *the dependent variable and thus there is no reason to suspect that the PCA scores will be good predictors. I would recommend using Partial Least Squares (PROC PLS) regression in you case, the dimensions/scores used are determined by finding dimensions that are predictive of your Y variable(s), a property that PCA cannot claim. There is also a logistic version of PLS that has been developed, see https://cedric.cnam.fr/fichiers/RC906.pdf

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to PaigeMiller

05-27-2017 01:49 PM

Hi PaigeMiller! Thank you for your response! And thanks for your input regarding the different method. However, I am still unable to predict the values for the principal components (the ones in the eigenvector table) I chose to retain.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to janimagus

05-27-2017 03:02 PM

We are probably using different terminology to describe the value you want. And so I do not understand what you mean by predicted values from PCA.

Principal components produces scores in each dimension (one for each observation), and it produces loadings (also called eigenvectors) in each dimension, one for each original variable.

So can you tell me in your own words what this predicted value is that you are looking for? Predicted value of WHAT? (For example, in an ordinary least squares regression, you can obtain the predicted values are for the y-variables; you can also obtain estimates of the slope and intercept; but since there are no y-variables in PCA, I am still not sure what is being predicted).

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to janimagus

05-27-2017 04:21 PM

You have the principal components from PROC FACTOR.

You have the new data scored with the principal compenents from PROC SCORE.

Take the time to spell out EXACTLY what you want because whatever you're trying to accomplish is unclear.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-28-2017 07:01 AM - edited 05-28-2017 07:26 AM

Agreeing with @Reeza, we don't understand what you are trying to do, we don't understand what numbers you are trying to compute, we don't understand the phrase "predicted values" in the context of Principal Components analysis. Much more detail about what you want to do is critical here.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to janimagus

05-30-2017 12:45 AM

Hi,

In order to get predicted components you need to get eigenvector using proc princomp which will be used in score procedure. Please try this.

PROC PRINCOMP DATA=sashelp.heart OUTSTAT=eigenvector out=pc;

VAR ageatstart height weight diastolic systolic;

RUN;

/*----this part does not work (for predicting)-----*/

PROC SCORE DATA=sashelp.heart SCORE=eigenvector OUT=newdata PREDICT;

VAR ageatstart height weight diastolic systolic;

RUN;