Hi there! I hope you can help me out with my problem. This is for my project in school.
After conducting a survey, I performed Principal Component Analysis on the variables (survey questions) to reduce their count. I used PROC PRINCOMP to obtain the principal components. Before I can use the principal components I chose to retain in logistic regression, I need to predict their values first. I tried using PROC SCORE but somehow I could not make it work.
It worked with STATA when I used the command "predict <principal components>, score". I wonder if there is a similar statement or function in SAS to produce the same output?
Thank you very much!
These are my code so far for your reference. 😄
/*----------for PCA---------*/
PROC PRINCOMP DATA=Data OUT=prin;
VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
RUN;
/*----this part does not work (for predicting)-----*/
PROC SCORE DATA=Data SCORE=print OUT=newdata PREDICT;
VAR prin1 prin2;
RUN;
I think you need the OUTSTAT dataset not the OUT dataset?
See the example on the doc though it uses PROC FACTOR the methods should be similar.
Hi Reeza! Thanks for your response!
I followed the example from the link you gave me. These are my code:
PROC FACTOR DATA=Data OUTSTAT=fact METHOD=PRIN EIGENVECTORS SCORE;
VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
RUN;
PROC SCORE DATA=DATA SCORE=fact OUT=newdata PREDICT;
VAR v1 v2 v3 v4 v5 v6 v7 v8 v9 v10;
RUN;
However, what I got was the predicted values of the factors, not the principal components. Do you have any other inputs?
Using PROC PRINCOMP:
Principal component scores (is that what you are referring to?) are in the OUT= data set
Principal component loadings (vectors) are in the OUTSTAT= data set
Doing PCA to get scores so you can do a logistic regression is not something I would recommend because the PCA scores are computed ignoring the dependent variable and thus there is no reason to suspect that the PCA scores will be good predictors. I would recommend using Partial Least Squares (PROC PLS) regression in you case, the dimensions/scores used are determined by finding dimensions that are predictive of your Y variable(s), a property that PCA cannot claim. There is also a logistic version of PLS that has been developed, see https://cedric.cnam.fr/fichiers/RC906.pdf
Hi PaigeMiller! Thank you for your response! And thanks for your input regarding the different method. However, I am still unable to predict the values for the principal components (the ones in the eigenvector table) I chose to retain.
We are probably using different terminology to describe the value you want. And so I do not understand what you mean by predicted values from PCA.
Principal components produces scores in each dimension (one for each observation), and it produces loadings (also called eigenvectors) in each dimension, one for each original variable.
So can you tell me in your own words what this predicted value is that you are looking for? Predicted value of WHAT? (For example, in an ordinary least squares regression, you can obtain the predicted values are for the y-variables; you can also obtain estimates of the slope and intercept; but since there are no y-variables in PCA, I am still not sure what is being predicted).
You have the principal components from PROC FACTOR.
You have the new data scored with the principal compenents from PROC SCORE.
Take the time to spell out EXACTLY what you want because whatever you're trying to accomplish is unclear.
Agreeing with @Reeza, we don't understand what you are trying to do, we don't understand what numbers you are trying to compute, we don't understand the phrase "predicted values" in the context of Principal Components analysis. Much more detail about what you want to do is critical here.
Hi,
In order to get predicted components you need to get eigenvector using proc princomp which will be used in score procedure. Please try this.
PROC PRINCOMP DATA=sashelp.heart OUTSTAT=eigenvector out=pc;
VAR ageatstart height weight diastolic systolic;
RUN;
/*----this part does not work (for predicting)-----*/
PROC SCORE DATA=sashelp.heart SCORE=eigenvector OUT=newdata PREDICT;
VAR ageatstart height weight diastolic systolic;
RUN;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.