Hi ,
The code im using is
/* what are the correlations between PCs and orig vars? */
proc corr data= work.ass2_drugbankdata noprob nosimple;
var MW LogP LogD Hdonors Hacceptors PSA ROT NATOM NRING ;
with Prin1- Prin9;
run;
I cant run this due to an error:-
ERROR: Variable PRIN1 not found.
This doesn't make sense as the Principal components are named Prin1 - Prin9
Can somene shed some light on what could be going on here ?
PRINCOMP leaves the input data set umodified, and adds the principal component scores to the OUT= data set. So you should be using syntax something like this:
proc princomp data=myData out=PCOUT <other options>;
var ...
run;
proc corr data=PCOUT <options>;
var ...
with ...
run;
Show us PROC CONTENTS of the data set work.ass2_drugbankdata
Can i post a picture or do i need to follow some protocol code like creating a minimum reproducible example?
We'd need a screen capture of the entire PROC CONTENTS output, or better yet the text equivalent from the LISTING window, pasted into your reply using the </> icon .
@axelpuri wrote:
Can i post a picture or do i need to follow some protocol code like creating a minimum reproducible example?
For proc contents output Listing or any output that will format here is fine.
Data is preferred to be data step code. Data step does not run into cross operating system issue or generally language options. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
By the way, @axelpuri , if you look at the contents of the data set and PRIN1 is not in there, then I think we don't need to see the results.
Prin1 is the principal component 1 . I need a correlation table for the original variables with the PCs
I cant do that using the component pattern profile plot as there are 9 variables and hence 9 PCs and it looooks really cluttered
Further i could have gleaned more information from the component plots but i dont get the all plots for all the different combinations of PCS and variables
The only option left now is to get the correlations in a tabular format but i cant seem to run the code because of the error for Prin1 when its clearly a PC as you can see from the eigenvectors table.
Look forward to hearnig from you !! Kind Regards
PRINCOMP leaves the input data set umodified, and adds the principal component scores to the OUT= data set. So you should be using syntax something like this:
proc princomp data=myData out=PCOUT <other options>;
var ...
run;
proc corr data=PCOUT <options>;
var ...
with ...
run;
@axelpuri wrote:
Btw i was able to get the table just now
What i dont understand is why didn't i have to enter my data as work.mydatasetname in the data= ... part ?
/* what are the correlations between PCs and orig vars? */
proc corr data=PCOUT noprob nosimple;
var MW LogP LogD Hdonors Hacceptors PSA ROT NATOM NRING;
with Prin1-Prin9;
run;
What does PCout mean ?
PCOUT is the name of a data set created by PROC PRINCOMP in the code above. It contains both the original variables and the principal components (PRIN1-PRIN9) variables.
So, if you are going to produce a correlation analysis between the original variables and the principal component variables, they must be in the same data set, which in this case is PCOUT. If you run PROC CORR on PCOUT, you can obtain the desired correlations.
@axelpuri wrote:
Prin1 is the principal component 1 . I need a correlation table for the original variables with the PCs
I cant do that using the component pattern profile plot as there are 9 variables and hence 9 PCs and it looooks really cluttered
Further i could have gleaned more information from the component plots but i dont get the all plots for all the different combinations of PCS and variables
The only option left now is to get the correlations in a tabular format but i cant seem to run the code because of the error for Prin1 when its clearly a PC as you can see from the eigenvectors table.
I'm glad you have the answer now, but a simple debugging step when SAS says it cannot find a variable, which you could perform yourself, is to look at PROC CONTENTS. In fact, this ought to be the first thing to do when you get that error.
Run proc contents on your data set and double check the spelling of your variables.
You may have been seeing variable labels of Prin1 which is not necessarily the Name of the variable.
There is also a chance that when creating the data set used an existing variable was dropped or renamed.
If SAS says variable XXXXX does not exist, it doesn't exist as spelled with that name.
Also, there is a mathematical formula for the correlation between the original variable and the i-th value of the j-th principal component vector, which is given here: https://stats.stackexchange.com/questions/253718/correlation-between-an-original-variable-and-a-prin...
Which means (to me) that you don't really have to compute the correlations between original variables and principal components, because the absolute values of the eigenvector determine which are the variables that are most highly correlated in each dimension (at least in the case where you are using the default PRINCOMP input, which is to use the correlation matrix).
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.
Find more tutorials on the SAS Users YouTube channel.