BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SASdevAnneMarie
Rhodochrosite | Level 12

Hello Experts,

 

I would like to add supplementary variables to the PCA correlation circle.

I'm wondering how I can retrieve the correlations of "active" variables (their coordinates on the circle).  In fact, they are not present in the output table.

For the correlation of supplementary variables, should I apply proc corr or is there an option to add supplementary variables?

In fact, with these correlations, I want to remake the circle.

 

Thank you for your help !

 

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

This can be done by using an ODS OUTPUT statement to save the data set produced by the PLOTS=PATTERN(CIRCLES=) option in PROC PRINCOMP and adding to it the correlations of the supplementary variables with the principal components. Use the OUT= option to save the principal component scores. After adding the supplementary variable correlations, you can use PROC SGPLOT to reconstruct the pattern plot. The following example uses the crime data in the Getting Started section of the PRINCOMP documentation. The analysis is done using just five of the variables. The Murder and Rape variables are treated as supplementary variables.

proc princomp n=2 data=Crime plots=pattern(circles=50 75 100) out=out;
   var robbery--auto_theft; id state; 
   ods output patternplot=pp;
   run;
proc corr data=out outp=supp; 
   var murder rape; with prin:; 
   run;
proc transpose data=supp(where=(_type_='CORR')) out=supp2 name=Variable;
   run;
data all;
   set pp supp2;
   run;
proc sgplot data=all aspect=.9 noautolegend;
  ellipseparm semimajor=1 semiminor=1 / 
     slope=0 xorigin=0 yorigin=0 clip 
     lineattrs=(color=blue)
     transparency=0.9;
  ellipseparm semimajor=0.87 semiminor=0.87 / 
     slope=0 xorigin=0 yorigin=0 clip 
     lineattrs=(color=blue)
     transparency=0.9;
  ellipseparm semimajor=0.71 semiminor=0.71 / 
     slope=0 xorigin=0 yorigin=0 clip 
     lineattrs=(color=blue)
     transparency=0.9;
  scatter x=xcirclelabel y=ycirclelabel / markercharattrs=(size=9pt)
     markerchar=circlelabel transparency=0.7;
  scatter x=prin1 y=prin2 / 
    markerattrs=(color=blue symbol=circle) datalabel=variable;
  refline 0 / axis=x;
  refline 0 / axis=y;
  xaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
  yaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
  title "Component Pattern";
  run;

First the pattern plot on the original five variables as produced by PRINCOMP.

PatternPlot5vars.png

Now the pattern plot, as drawn by the PROC SGPLOT code above, adding the two supplementary variables, Murder and Rape, in the original principal component space.

PatternPlot.png

View solution in original post

9 REPLIES 9
StatDave
SAS Super FREQ

This can be done by using an ODS OUTPUT statement to save the data set produced by the PLOTS=PATTERN(CIRCLES=) option in PROC PRINCOMP and adding to it the correlations of the supplementary variables with the principal components. Use the OUT= option to save the principal component scores. After adding the supplementary variable correlations, you can use PROC SGPLOT to reconstruct the pattern plot. The following example uses the crime data in the Getting Started section of the PRINCOMP documentation. The analysis is done using just five of the variables. The Murder and Rape variables are treated as supplementary variables.

proc princomp n=2 data=Crime plots=pattern(circles=50 75 100) out=out;
   var robbery--auto_theft; id state; 
   ods output patternplot=pp;
   run;
proc corr data=out outp=supp; 
   var murder rape; with prin:; 
   run;
proc transpose data=supp(where=(_type_='CORR')) out=supp2 name=Variable;
   run;
data all;
   set pp supp2;
   run;
proc sgplot data=all aspect=.9 noautolegend;
  ellipseparm semimajor=1 semiminor=1 / 
     slope=0 xorigin=0 yorigin=0 clip 
     lineattrs=(color=blue)
     transparency=0.9;
  ellipseparm semimajor=0.87 semiminor=0.87 / 
     slope=0 xorigin=0 yorigin=0 clip 
     lineattrs=(color=blue)
     transparency=0.9;
  ellipseparm semimajor=0.71 semiminor=0.71 / 
     slope=0 xorigin=0 yorigin=0 clip 
     lineattrs=(color=blue)
     transparency=0.9;
  scatter x=xcirclelabel y=ycirclelabel / markercharattrs=(size=9pt)
     markerchar=circlelabel transparency=0.7;
  scatter x=prin1 y=prin2 / 
    markerattrs=(color=blue symbol=circle) datalabel=variable;
  refline 0 / axis=x;
  refline 0 / axis=y;
  xaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
  yaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
  title "Component Pattern";
  run;

First the pattern plot on the original five variables as produced by PRINCOMP.

PatternPlot5vars.png

Now the pattern plot, as drawn by the PROC SGPLOT code above, adding the two supplementary variables, Murder and Rape, in the original principal component space.

PatternPlot.png

SASdevAnneMarie
Rhodochrosite | Level 12
Thank you, StatDave!
I'm wondering if I should standardize the 'murder' and 'rape' variables using PROC STANDARD before calculating the correlation?
StatDave
SAS Super FREQ

Standardization won't change the correlation.

SASdevAnneMarie
Rhodochrosite | Level 12
Right! 🙂
SASdevAnneMarie
Rhodochrosite | Level 12
Hello StatDave,
To add the supplementary individuals I use the proc score to calculate the variables and scores. I'm wondering if there is an another special option is SAS for doing this.
Ksharp
Super User

You gotta need the help of SAS/IML to use matrix multiply operator.

Calling @Rick_SAS 

 

https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html

Rick_SAS
SAS Super FREQ

If you need a refresher on the matrix operations behind creating scores plots, see "A classical principal component analysis in SAS/IML" in the article, "Robust PCA in SAS." 

 

StatDave
SAS Super FREQ

If you are asking about computing values on the principal components for new observations, then the method you used is the best way.

SASdevAnneMarie
Rhodochrosite | Level 12
Thank you StatDave!

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 904 views
  • 8 likes
  • 4 in conversation