Data visualization with SAS programming

Scatter Plot with 2 Categorical Variables

Accepted Solution Solved
Reply
Regular Contributor
Posts: 171
Accepted Solution

Scatter Plot with 2 Categorical Variables

I want to create a scatter plot where the plot symbol values are determined according to the values of one categorical variable and the plot symbol colors are determined by another dichotomous categorical variable.  For example, using the sashelp.class data set, suppose I want to make a simple scatter plot of height*weight.  I want each age to be denoted with a different plot symbol, summarized in a legend.  For example a dot for age 11, a square for age 12, etc.  Furthermore, I want the color of the plot symbol to be blue for all males and red for all females.  Therefore, a red square would indicate a 12 year old female and a blue square would indicate a 12 year old male.  Is this possible?  Any feasible solution would need to be generalized so that it could work with BY group processing.  For example, if I had a data set with 20 classes of students with different ages and I wanted to create a separate graph for each class.

Thanks in advance.


Accepted Solutions
Solution
‎01-10-2012 11:24 AM
SAS Super FREQ
Posts: 864

Scatter Plot with 2 Categorical Variables

If you have at least SAS 9.2, maintenance 3, you can create your example in GTL using the following code:

proc template;
define statgraph plot;
begingraph;
layout overlay;
  seriesplot x=weight y=height / group=age lineattrs=(thickness=0) display=(markers)
             markercolorgroup=sex markersymbolgroup=age name="scatter";
  discretelegend "scatter" / type=markercolor location=inside
                 autoalign=(topleft topright bottomright bottomleft);
  discretelegend "scatter" / type=markersymbol;
endlayout;
endgraph;
end;
run;

proc sgrender data=sashelp.class template=plot; run;

View solution in original post


All Replies
SAS Employee
Posts: 963

Scatter Plot with 2 Categorical Variables

I don't think you'll be able to do this with standard Proc Gplot, using "plot y*x=z".

You can do this with annotated markers (controlling both the shape and color programmatically).

Let me know if you need to see some sample code.

Regular Contributor
Posts: 171

Scatter Plot with 2 Categorical Variables

I'll look into your suggestion about annotated markers.  I've used the annotate facility to draw lines and insert text boxes, but not to change the symbols themselves.  Yes, if you have some sample code that is relevant to this problem, I would like to see it.  Thanks.

SAS Employee
Posts: 963

Scatter Plot with 2 Categorical Variables

Ok - here's some code to play with...

I slightly modified my original idea.  I go ahead and use "plot y*x=z" to get the symbols & legend.

And I then annotate a simple colored 'dot/pie' on top of that symbol to get the desired color:

data anno_color; set sashelp.class;
length color $8;
xsys='2'; ysys='2'; when='a'; hsys='3';
x=weight;
y=height;
function='pie'; style='psolid'; rotate=360; size=.8;
if sex='M' then color='blue';
else if sex='F' then color='pink';
else color='black';
run;

symbol1 value=circle height=7pct color=gray33;
symbol2 value=diamond height=7pct color=gray33;
symbol3 value=plus height=7pct color=gray33;
symbol4 value=square height=7pct color=gray33;
symbol5 value=star height=7pct color=gray33;

legend1 position=(bottom center);

proc gplot data=sashelp.class anno=anno_color;
plot height*weight=age / legend=legend1;
run;

Regular Contributor
Posts: 171

Scatter Plot with 2 Categorical Variables

That's a really creative solution.  Thanks again!

Solution
‎01-10-2012 11:24 AM
SAS Super FREQ
Posts: 864

Scatter Plot with 2 Categorical Variables

If you have at least SAS 9.2, maintenance 3, you can create your example in GTL using the following code:

proc template;
define statgraph plot;
begingraph;
layout overlay;
  seriesplot x=weight y=height / group=age lineattrs=(thickness=0) display=(markers)
             markercolorgroup=sex markersymbolgroup=age name="scatter";
  discretelegend "scatter" / type=markercolor location=inside
                 autoalign=(topleft topright bottomright bottomleft);
  discretelegend "scatter" / type=markersymbol;
endlayout;
endgraph;
end;
run;

proc sgrender data=sashelp.class template=plot; run;

Regular Contributor
Posts: 171

Scatter Plot with 2 Categorical Variables

Thank you.  When I run the program, every age has a plot symbol of circle, so there is no way to distinguish between the ages.  The documentation on the MARKERCOLORGROUP=  option says that the option is ignored unless MARKERSYMBOL=CHARACTER. 

SAS Super FREQ
Posts: 864

Scatter Plot with 2 Categorical Variables

I'm sorry,I was incorrect about the required version of SAS. The code I gave you requires SAS 9.3. Sorry about the mistake.

Regular Contributor
Posts: 171

Scatter Plot with 2 Categorical Variables

I do have SAS 9.3 and your code is working.  For some reason the plot symbols are are not displayed correctly in the HTML SAS result viewer.  When I use a different ODS destination such as PDF, it works perfectly.  Thanks!

SAS Super FREQ
Posts: 864

Re: Scatter Plot with 2 Categorical Variables

The reason you saw all of the same symbols in HTML is that the new default style (HTMLBLUE) uses the same symbol until it iterates through all of the colors. Try using, "ods html style=htmlbluecml", to get the same colors with unique symbols.

Post a Question
Discussion Stats
  • 9 replies
  • 2046 views
  • 3 likes
  • 3 in conversation