Hi everyone,
I’m trying to create a PCA biplot using PROC SGPLOT. My goal is to display both:
I know that PROC SGPLOT provides both the SCATTER and VECTOR statements. I extracted the eigenvalues (scores) and plotted them successfully using SCATTER. However, I’m having trouble extracting the eigenvectors (loadings) and plotting them as vectors on the same graph.
I generated some fictitious data for testing purposes and tried different approaches, but I haven’t been able to produce the desired biplot with both scores and loadings.
I’m attaching the graphs generated by PROC PRINCOMP. While these show the component pattern plot, I don’t know how to extract both the scores and loadings correctly for use with PROC SGPLOT. That’s why I tried using SGPLOT directly, but without success so far.
I’d appreciate any help or suggestions from anyone with experience in this area.
Thank you!
data dados_ficticios;
input grupo $ var1 var2 var3 var4;
datalines;
A 5 7 8 6
A 6 8 9 5
A 5 7 7 6
B 8 5 6 7
B 7 4 5 8
B 8 6 7 6
C 3 9 8 4
C 2 8 7 5
C 3 9 8 4
;
run;
proc princomp data=dados_ficticios out=escores outstat=autovetores n=2 plots=all;
var var1 var2 var3 var4;
id grupo;
run;
proc print data=escores;run;
proc print data=autovetores;run;
> I tried to replicate your code with my data, but I couldn’t get it to work.
Was there an error?
The hardest part of biplots is scaling the vectors (see https://blogs.sas.com/content/iml/2019/11/06/what-are-biplots.html )
In my blog post on creating biplots in SAS, I opted to let PROC PRINQUAL handle the scaling. From the output of PROC PRINQUAL, you can get the PCA coordinates and the vector coordinates. Then you can use PROC SGPLOT to add reference lines, ellipses, etc.
ods select none;
proc prinqual data=Sashelp.iris plots=(MDPref)
n=2 /* project onto Prin1 and Prin2 */
mdpref=1; /* use COV scaling */
transform identity(SepalLength SepalWidth PetalLength PetalWidth); /* identity transform */
id Species;
ods output MDPrefPlot = PCA;
run;
ods select all;
title "Biplot";
proc sgplot data=PCA aspect=1;
scatter x=Prin1 y=Prin2 / group=IDLab1 markerattrs=(size=12 symbol=circlefilled) transparency=0.2;
ellipse x=Prin1 y=Prin2 / group=IDLab1;
vector x=Vec1 y=Vec2 / datalabel=VName lineattrs=(thickness=2 color=black);
refline 0 / axis=x;
refline 0 / axis=y;
run;
Thank you for your response!
I tried to replicate your code with my data, but I couldn’t get it to work. So I went back to trying the approach I was using with PROC PRINCOMP. However, I’m having trouble extracting the loadings (OUTSTAT=OutStat) and combining them with the scores (OUT=PCA). Below is the code I’m trying to use. If I remove the creation of the vectors dataset and also the VECTOR statement from PROC SGPLOT, the plot is generated normally, but it doesn’t include the variable vectors (loadings).
data iris;
set Sashelp.iris;
run;
proc princomp data=iris plots=pattern(vector) plots=(score(ellipse)) n=2 standard out=PCA outstat=OutStat noprint;
var SepalLength SepalWidth PetalLength PetalWidth;
id Species;
run;
data vectors;
set OutStat;
where _TYPE_='SCORE';
length Variable $12.;
Variable = _NAME_;
x = Prin1 * 2;
y = Prin2 * 2;
keep Variable x y;
run;
ods graphics / reset width=800px height=600px imagename="PCA_Graph" imagefmt=png antialias=on;
proc sgplot data=PCA;
scatter x=Prin1 y=Prin2 / group=species
markerattrs=(size=12 symbol=circlefilled)
transparency=0.2;
vector x=x y=y / datalabel=Variable
arrowattrs=(thickness=2 color=black)
lineattrs=(pattern=solid);
xaxis label="CP1 (72.77%)"
labelattrs=(Family="Times New Roman" Size=11 Weight=bold)
valueattrs=(Family="Times New Roman" Size=10 Weight=bold);
yaxis label="CP2 (23.03%)"
labelattrs=(Family="Times New Roman" Size=11 Weight=bold)
valueattrs=(Family="Times New Roman" Size=10 Weight=bold);
keylegend / location=outside
position=right
title="Species"
titleattrs=(color=black family="Times New Roman" size=11)
valueattrs=(color=black family="Times New Roman" size=10);
ellipse x=Prin1 y=Prin2 / group=species
lineattrs=(pattern=solid thickness=2)
transparency=0.4;
refline 0 / axis=x lineattrs=(color=gray pattern=solid thickness=1);
refline 0 / axis=y lineattrs=(color=gray pattern=solid thickness=1);
run;
ods graphics / reset;
> I tried to replicate your code with my data, but I couldn’t get it to work.
Was there an error?
The hardest part of biplots is scaling the vectors (see https://blogs.sas.com/content/iml/2019/11/06/what-are-biplots.html )
In my blog post on creating biplots in SAS, I opted to let PROC PRINQUAL handle the scaling. From the output of PROC PRINQUAL, you can get the PCA coordinates and the vector coordinates. Then you can use PROC SGPLOT to add reference lines, ellipses, etc.
ods select none;
proc prinqual data=Sashelp.iris plots=(MDPref)
n=2 /* project onto Prin1 and Prin2 */
mdpref=1; /* use COV scaling */
transform identity(SepalLength SepalWidth PetalLength PetalWidth); /* identity transform */
id Species;
ods output MDPrefPlot = PCA;
run;
ods select all;
title "Biplot";
proc sgplot data=PCA aspect=1;
scatter x=Prin1 y=Prin2 / group=IDLab1 markerattrs=(size=12 symbol=circlefilled) transparency=0.2;
ellipse x=Prin1 y=Prin2 / group=IDLab1;
vector x=Vec1 y=Vec2 / datalabel=VName lineattrs=(thickness=2 color=black);
refline 0 / axis=x;
refline 0 / axis=y;
run;
Dear Rick,
Thank you for your help.
I was finally able to replicate your code with my data and create the biplot. Initially, I was trying to extract the scores using PROC PRINCOMP, but I couldn't manage to do it. However, I was able to get it working using PROC PRINQUAL.
By any chance, do you know if it’s possible to extract the scores using PROC PRINCOMP?
Thanks again!
> do you know if it’s possible to extract the scores using PROC PRINCOMP?
The "scores" usually refer to the projection of the data onto the first two PCs. Yes, PROC PRINTCOMP provides that information.
The challenge is getting the vectors for the biplots. There are four common scalings for the vectors (see my articles), and it is easiest to get the vectors from the SVD decomposition of the data matrix, X. PROC PRINCOMP gives the eigenvalues and eigenvectors of the covariance matrix of X. These are related, but it's easier to work with the SVD.
If you are proficient with PROC IML and matrix computations, you can use PROC IML to perform the necessary matrix calculations for the biplots. But if you want the information from a STAT procedure, use PROC PRINQUAL. My articles show both methods.
Code should go in the code box — click on the little running man icon and paste your code into the window that appears, as @Rick_SAS has done.
Are errors in the log? If so, show us the log. (Use the log window, click on the </> icon and paste the relevant parts of the log including code and error messages into the window that appears) If the problem produces the wrong output, show us the incorrect output and what you think the correct output should be.
Ready to level-up your skills? Choose your own adventure.