Data visualization with SAS programming

Principal components plot with ellipses

Reply
N/A
Posts: 1

Principal components plot with ellipses

Hi,

I am trying to use the sgrender function with Proc Princomp to develop a scatter plot of observations with TWO ellipses. I basically have two populations that seem to be distinct, but I would like to establish an individual 95% confidence interval ellipse over each population. I am completely stumped on how to do it and the closest I have come is to use the template for a score plot on principal components. Any suggestions?

Thanks in advance - Brian
SAS Super FREQ
Posts: 8,819

Re: Principal components plot with ellipses

Hi:
You could alter the template used by PROC PRINCOMP and add an ELLIPSE statement, however, I do not believe the ELLIPSE statement supports the use of GROUP=, so you'd have to decide how to identify each population.

Another method would be to get your data points from PROC PRINCOMP and then use the SGPLOT procedure with the SCATTER statement and the GROUP= option to show your two populations and then use an ELLIPSE statement for each population. You may need to do something to create GROUP variables for your data if they do not come directly from PROC PRINCOMP.

In the example below, I use SASHELP.HEART as the input data set and deal with only 2 groups, the BORDERLINE and HIGH cholesterol folks who have died. The SCATTER statement is easy, because I can use GROUP=CHOL_STATUS to get point markers in different colors for each group. However, if I use a single ELLIPSE statement based on CHOLESTEROL, I'll only get one ellipse and that's not what I want.

So with a bit of data step manipulation, I make 2 new variables: VERYHIGHCHOL and BORDERLINECHOL. If a person's cholesterol status is 'High' then VERYHIGHCHOL will be their cholesterol number and BORDERLINECHOL will be missing. This gives me a numeric variable for my ELLIPSE statements. I'll have one ELLIPSE statement for VERYHIGHCHOL and another ELLIPSE statement for BORDERLINECHOL.

Since the SG procedures work on an overlay basis, the 2 ellipses will be overlaid on the grouped scatter plot. Perhaps an approach like this would work for your problem. Or, you may want to pursue a template/sgrender approach, working directly with the PRINCOMP graph template.

The SGPLOT example is shown below.

cynthia

Example of two separate ellipses using SGPLOT:
[pre]
** Create subset of only High and Borderline;
** to make 2 distinct groups.;
** Make two different numeric variables to be used;
** in the ELLIPSE statements.;
** If your data did not already have GROUP variables, you could;
** also create them here.;

data heart;
set sashelp.heart;
where status = 'Dead';
if chol_status = 'High' then do;
VeryHighChol = cholesterol;
output;
end;
else if chol_status = 'Borderline' then do;
BorderlineChol = cholesterol;
output;
end;
run;

proc print data=heart(obs=25);
title 'What do new variables look like?';
var sex ageatdeath cholesterol chol_status veryhighchol borderlinechol;
run;

proc sort heart;
by cholesterol;
run;

** Use SCATTER statement on CHOLESTEROL variable;
** but use ELLIPSE statement with second numeric variable;
** that represents the cholesterol for that group only;
proc sgplot data=heart;
ellipse x=VeryHighChol y=ageatdeath / alpha=0.05 name='vh05';
ellipse x=BorderlineChol y=ageatdeath / alpha=0.05 name='bl05';
scatter x=cholesterol y=ageatdeath / group=chol_status name='sc';
title 'Using Two Ellipses';
keylegend 'vh05' 'bl05'
/ title="Alpha Levels" across=1
location=inside position=topright;
keylegend 'sc'
/ title="Cholesterol Status" down=1
location=outside position=bottom ;
run;

[/pre]

Altering the PRINCOMP template may give you more analytical control over the way the graphs are produced directly from the procedure. However, the method shown above (with SASHELP.HEART data) may work for you, if your data lends itself to this kind of manipulation after PRINCOMP is done with the analysis. You can use PROC PRINCOMP to create an output dataset, using syntax similar to the following:
[pre]

proc princomp data = out= n=5 cov std;
title 'Use OUT= option and other PRINCOMP options for analysis';
run;

proc print data = ;
title 'Output Data set Created From PROC PRINCOMP';
run;

** then use this new output dataset with the SGPLOT procedure;
[/pre]
Ask a Question
Discussion stats
  • 1 reply
  • 386 views
  • 0 likes
  • 2 in conversation