Hello Experts,
I would like to add supplementary variables to the PCA correlation circle.
I'm wondering how I can retrieve the correlations of "active" variables (their coordinates on the circle). In fact, they are not present in the output table.
For the correlation of supplementary variables, should I apply proc corr or is there an option to add supplementary variables?
In fact, with these correlations, I want to remake the circle.
Thank you for your help !
This can be done by using an ODS OUTPUT statement to save the data set produced by the PLOTS=PATTERN(CIRCLES=) option in PROC PRINCOMP and adding to it the correlations of the supplementary variables with the principal components. Use the OUT= option to save the principal component scores. After adding the supplementary variable correlations, you can use PROC SGPLOT to reconstruct the pattern plot. The following example uses the crime data in the Getting Started section of the PRINCOMP documentation. The analysis is done using just five of the variables. The Murder and Rape variables are treated as supplementary variables.
proc princomp n=2 data=Crime plots=pattern(circles=50 75 100) out=out;
var robbery--auto_theft; id state;
ods output patternplot=pp;
run;
proc corr data=out outp=supp;
var murder rape; with prin:;
run;
proc transpose data=supp(where=(_type_='CORR')) out=supp2 name=Variable;
run;
data all;
set pp supp2;
run;
proc sgplot data=all aspect=.9 noautolegend;
ellipseparm semimajor=1 semiminor=1 /
slope=0 xorigin=0 yorigin=0 clip
lineattrs=(color=blue)
transparency=0.9;
ellipseparm semimajor=0.87 semiminor=0.87 /
slope=0 xorigin=0 yorigin=0 clip
lineattrs=(color=blue)
transparency=0.9;
ellipseparm semimajor=0.71 semiminor=0.71 /
slope=0 xorigin=0 yorigin=0 clip
lineattrs=(color=blue)
transparency=0.9;
scatter x=xcirclelabel y=ycirclelabel / markercharattrs=(size=9pt)
markerchar=circlelabel transparency=0.7;
scatter x=prin1 y=prin2 /
markerattrs=(color=blue symbol=circle) datalabel=variable;
refline 0 / axis=x;
refline 0 / axis=y;
xaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
yaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
title "Component Pattern";
run;
First the pattern plot on the original five variables as produced by PRINCOMP.
Now the pattern plot, as drawn by the PROC SGPLOT code above, adding the two supplementary variables, Murder and Rape, in the original principal component space.
This can be done by using an ODS OUTPUT statement to save the data set produced by the PLOTS=PATTERN(CIRCLES=) option in PROC PRINCOMP and adding to it the correlations of the supplementary variables with the principal components. Use the OUT= option to save the principal component scores. After adding the supplementary variable correlations, you can use PROC SGPLOT to reconstruct the pattern plot. The following example uses the crime data in the Getting Started section of the PRINCOMP documentation. The analysis is done using just five of the variables. The Murder and Rape variables are treated as supplementary variables.
proc princomp n=2 data=Crime plots=pattern(circles=50 75 100) out=out;
var robbery--auto_theft; id state;
ods output patternplot=pp;
run;
proc corr data=out outp=supp;
var murder rape; with prin:;
run;
proc transpose data=supp(where=(_type_='CORR')) out=supp2 name=Variable;
run;
data all;
set pp supp2;
run;
proc sgplot data=all aspect=.9 noautolegend;
ellipseparm semimajor=1 semiminor=1 /
slope=0 xorigin=0 yorigin=0 clip
lineattrs=(color=blue)
transparency=0.9;
ellipseparm semimajor=0.87 semiminor=0.87 /
slope=0 xorigin=0 yorigin=0 clip
lineattrs=(color=blue)
transparency=0.9;
ellipseparm semimajor=0.71 semiminor=0.71 /
slope=0 xorigin=0 yorigin=0 clip
lineattrs=(color=blue)
transparency=0.9;
scatter x=xcirclelabel y=ycirclelabel / markercharattrs=(size=9pt)
markerchar=circlelabel transparency=0.7;
scatter x=prin1 y=prin2 /
markerattrs=(color=blue symbol=circle) datalabel=variable;
refline 0 / axis=x;
refline 0 / axis=y;
xaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
yaxis values=(-1.0 to 1.0 by 0.5) display=(nolabel);
title "Component Pattern";
run;
First the pattern plot on the original five variables as produced by PRINCOMP.
Now the pattern plot, as drawn by the PROC SGPLOT code above, adding the two supplementary variables, Murder and Rape, in the original principal component space.
Standardization won't change the correlation.
You gotta need the help of SAS/IML to use matrix multiply operator.
Calling @Rick_SAS
https://blogs.sas.com/content/iml/2014/02/19/scoring-a-regression-model-in-sas.html
If you need a refresher on the matrix operations behind creating scores plots, see "A classical principal component analysis in SAS/IML" in the article, "Robust PCA in SAS."
If you are asking about computing values on the principal components for new observations, then the method you used is the best way.
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.