BookmarkSubscribeRSS Feed
janimagus
Fluorite | Level 6

Hi there! I hope you can help me out with my problem. This is for my project in school.

 

After conducting a survey, I performed Principal Component Analysis on the variables (survey questions) to reduce their count. I used PROC PRINCOMP to obtain the principal components. Before I can use the principal components I chose to retain in logistic regression, I need to predict their values first. I tried using PROC SCORE but somehow I could not make it work.

 

It worked with STATA when I used the command "predict <principal components>, score". I wonder if there is a similar statement or function in SAS to produce the same output?

 

Thank you very much!

 

These are my code so far for your reference. 😄

 

/*----------for PCA---------*/

PROC PRINCOMP DATA=Data OUT=prin;
 VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
RUN;

 

/*----this part does not work (for predicting)-----*/

PROC SCORE DATA=Data SCORE=print OUT=newdata PREDICT;
VAR prin1 prin2;
RUN;

8 REPLIES 8
Reeza
Super User

I think you need the OUTSTAT dataset not the OUT dataset? 

 

See the example on the doc though it uses PROC FACTOR the methods should be similar. 

 

http://documentation.sas.com/?docsetId=statug&docsetVersion=14.2&docsetTarget=statug_score_examples0...

janimagus
Fluorite | Level 6

Hi Reeza! Thanks for your response! Smiley Very Happy

 

I followed the example from the link you gave me. These are my code:

 

PROC FACTOR DATA=Data OUTSTAT=fact METHOD=PRIN EIGENVECTORS SCORE;
VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
RUN;

 

PROC SCORE DATA=DATA SCORE=fact OUT=newdata PREDICT;
VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
RUN;

 

However, what I got was the predicted values of the factors, not the principal components. Smiley Sad Do you have any other inputs?

PaigeMiller
Diamond | Level 26

Using PROC PRINCOMP:

 

Principal component scores (is that what you are referring to?) are in the OUT= data set

 

Principal component loadings (vectors) are in the OUTSTAT= data set

 

Doing PCA to get scores so you can do a logistic regression is not something I would recommend because the PCA scores are computed ignoring the dependent variable and thus there is no reason to suspect that the PCA scores will be good predictors. I would recommend using Partial Least Squares (PROC PLS) regression in you case, the dimensions/scores used are determined by finding dimensions that are predictive of your Y variable(s), a property that PCA cannot claim. There is also a logistic version of PLS that has been developed, see https://cedric.cnam.fr/fichiers/RC906.pdf

--
Paige Miller
janimagus
Fluorite | Level 6

Hi PaigeMiller! Thank you for your response! And thanks for your input regarding the different method. However, I am still unable to predict the values for the principal components (the ones in the eigenvector table) I chose to retain. Smiley Sad

PaigeMiller
Diamond | Level 26

We are probably using different terminology to describe the value you want. And so I do not understand what you mean by predicted values from PCA.

 

Principal components produces scores in each dimension (one for each observation), and it produces loadings (also called eigenvectors) in each dimension, one for each original variable.

 

So can you tell me in your own words what this predicted value is that you are looking for? Predicted value of WHAT? (For example, in an ordinary least squares regression, you can obtain the predicted values are for the y-variables; you can also obtain estimates of the slope and intercept; but since there are no y-variables in PCA, I am still not sure what is being predicted).

--
Paige Miller
Reeza
Super User

You have the principal components from PROC FACTOR. 

 

You have the new data scored with the principal compenents from PROC SCORE. 

 

Take the time to spell out EXACTLY what you want because whatever you're trying to accomplish is unclear. 

 

 

PaigeMiller
Diamond | Level 26

Agreeing with @Reeza, we don't understand what you are trying to do, we don't understand what numbers you are trying to compute, we don't understand the phrase "predicted values" in the context of Principal Components analysis. Much more detail about what you want to do is critical here.

--
Paige Miller
stat_sas
Ammonite | Level 13

Hi,

 

In order to get predicted components you need to get eigenvector using proc princomp which will be used in score procedure. Please try this. 

 

PROC PRINCOMP DATA=sashelp.heart OUTSTAT=eigenvector out=pc;
VAR ageatstart height weight diastolic systolic;
RUN;

/*----this part does not work (for predicting)-----*/


PROC SCORE DATA=sashelp.heart SCORE=eigenvector OUT=newdata PREDICT;
VAR ageatstart height weight diastolic systolic;
RUN;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 2538 views
  • 6 likes
  • 4 in conversation