BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
axelpuri
Fluorite | Level 6

Hi ,

 

The code im using is 

 

/* what are the correlations between PCs and orig vars? */
proc corr data= work.ass2_drugbankdata noprob nosimple;
var MW LogP LogD Hdonors Hacceptors PSA ROT NATOM NRING ;
with Prin1- Prin9;
run;

 

I cant run this due to an  error:-

ERROR: Variable PRIN1 not found.

 

This doesn't make sense as the Principal components are named Prin1 - Prin9

 

Can somene shed some light on what could be going on here ?

 

1 ACCEPTED SOLUTION

Accepted Solutions
IanWakeling
Barite | Level 11

PRINCOMP leaves the input data set umodified, and adds the principal component scores to the OUT= data set.  So you should be using syntax something like this:

proc princomp data=myData  out=PCOUT  <other options>;
  var ...
run;

proc corr data=PCOUT <options>;
  var ...
  with ...
run;

View solution in original post

12 REPLIES 12
PaigeMiller
Diamond | Level 26

Show us PROC CONTENTS of the data set work.ass2_drugbankdata

--
Paige Miller
axelpuri
Fluorite | Level 6

Can i post a picture or do i need to follow some protocol code like creating a minimum reproducible example?

PaigeMiller
Diamond | Level 26

We'd need a screen capture of the entire PROC CONTENTS output, or better yet the text equivalent from the LISTING window, pasted into your reply using the </> icon .

--
Paige Miller
ballardw
Super User

@axelpuri wrote:

Can i post a picture or do i need to follow some protocol code like creating a minimum reproducible example?


For proc contents output Listing or any output that will format here is fine.

 

Data is preferred to be data step code. Data step does not run into cross operating system issue or generally language options. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

PaigeMiller
Diamond | Level 26

By the way, @axelpuri , if you look at the contents of the data set and PRIN1 is not in there, then I think we don't need to see the results.

--
Paige Miller
axelpuri
Fluorite | Level 6

Prin1 is the principal component 1 . I need a correlation table for the original variables with the PCs

 

I cant do that using the component pattern profile plot as there are 9 variables and hence 9 PCs and it looooks really cluttered 

 

Further i could have gleaned more information from the component plots but i dont get the all plots for all the different combinations of PCS and  variables 

 

The only option left now is to get the correlations in a tabular format but i cant seem to run the code because of  the error for Prin1 when its clearly a PC as you can see from the eigenvectors table. 

 

Look forward to hearnig from you !! Kind Regards

 

 

 

 

axelpuri
Fluorite | Level 6
Btw i was able to get the table just now
What i dont understand is why didn't i have to enter my data as work.mydatasetname in the data= ... part ?
/* what are the correlations between PCs and orig vars? */
proc corr data=PCOUT noprob nosimple;
var MW LogP LogD Hdonors Hacceptors PSA ROT NATOM NRING;
with Prin1-Prin9;
run;

What does PCout mean ?
IanWakeling
Barite | Level 11

PRINCOMP leaves the input data set umodified, and adds the principal component scores to the OUT= data set.  So you should be using syntax something like this:

proc princomp data=myData  out=PCOUT  <other options>;
  var ...
run;

proc corr data=PCOUT <options>;
  var ...
  with ...
run;
PaigeMiller
Diamond | Level 26

@axelpuri wrote:
Btw i was able to get the table just now
What i dont understand is why didn't i have to enter my data as work.mydatasetname in the data= ... part ?
/* what are the correlations between PCs and orig vars? */
proc corr data=PCOUT noprob nosimple;
var MW LogP LogD Hdonors Hacceptors PSA ROT NATOM NRING;
with Prin1-Prin9;
run;

What does PCout mean ?

PCOUT is the name of a data set created by PROC PRINCOMP in the code above. It contains both the original variables and the principal components (PRIN1-PRIN9) variables. 

 

So, if you are going to produce a correlation analysis between the original variables and the principal component variables, they must be in the same data set, which in this case is PCOUT. If you run PROC CORR on PCOUT, you can obtain the desired correlations.

--
Paige Miller
PaigeMiller
Diamond | Level 26

@axelpuri wrote:

Prin1 is the principal component 1 . I need a correlation table for the original variables with the PCs

 

I cant do that using the component pattern profile plot as there are 9 variables and hence 9 PCs and it looooks really cluttered 

 

Further i could have gleaned more information from the component plots but i dont get the all plots for all the different combinations of PCS and  variables 

 

The only option left now is to get the correlations in a tabular format but i cant seem to run the code because of  the error for Prin1 when its clearly a PC as you can see from the eigenvectors table. 


I'm glad you have the answer now, but a simple debugging step when SAS says it cannot find a variable, which you could perform yourself, is to look at PROC CONTENTS. In fact, this ought to be the first thing to do when you get that error.

--
Paige Miller
ballardw
Super User

Run proc contents on your data set and double check the spelling of your variables.

 

You may have been seeing variable labels of Prin1 which is not necessarily the Name of the variable.

 

There is also a chance that when creating the data set used an existing variable was dropped or renamed.

 

If SAS says variable XXXXX does not exist, it doesn't exist as spelled with that name.

PaigeMiller
Diamond | Level 26

Also, there is a mathematical formula for the correlation between the original variable and the i-th value of the j-th principal component vector, which is given here: https://stats.stackexchange.com/questions/253718/correlation-between-an-original-variable-and-a-prin...

 

Which means (to me) that you don't really have to compute the correlations between original variables and principal components, because the absolute values of the eigenvector determine which are the variables that are most highly correlated in each dimension (at least in the case where you are using the default PRINCOMP input, which is to use the correlation matrix).

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 12 replies
  • 1547 views
  • 3 likes
  • 4 in conversation