Calcite | Level 5

## Using Bubble Plots

I need some help trying to construct a plot, and I don’t think that any of the ones I learned in school will work for this. I have 3 variables I am trying to display Celiac (dichotomous), RIDAGEYR “age”(continuous), MCQ160c(dichotomous). What I am imagining in my head would be to split the variable Celiac into two groups; 0 and 1, and ultimately make two identical plots to compare the two samples. From there I want Age on the Xaxis and MCQ160c (1 and 2) on the y axis. I then get a simple plot that merely shows the range, not frequency.

My issue with this plot is that are multiple people from each individual age and I would like my plot to somehow represent frequencies for each age. I want to be able to tell  that although the ranges for the answer of both 1 and 2 for MCQ160c are the same, there are differences in frequency. For example  The people that answered MCQ160C with “2” tended to be younger, whereas  those that answered “1” were younger. So what I am thinking is something like the plot below. It is a bubble plot I found on the SAS website (I put their example code next to it). My  bubble plot would be set up the same as the one above, except I want the bubble size to reflect frequency as they did below. My issue here is that I am dealing with over 10,000 samples and my data is raw data so I don’t have a variable for N or “num (like they do in the example bubble plot)” .

proc gplot data=jobs;

format dollars dollar9.;

bubble dollars*eng=num / haxis=axis1;

run;

quit;

This example shows a bubble plot in which each bubble represents a category of engineer. The plot shows engineers on the horizontal axis and average salaries on the vertical axis. Each bubble's vertical location is determined by the average salary for the category. Each bubble's size is determined by the number of engineers in the category: the more engineers, the larger the bubble.

I know how to use PROC FREQ, but I don’t know how to get those frequencies integrated into my plot. I am not dead set on a bubble plot, it is just the closest thing I could find to what I wanted. Do you know if there is a way I can just put my code in as follows and have it recognize that by “N” I mean the frequency. Is there a better way to represent tha data?

Proc gplot data=steph.chdrisk;

Bubble mcq160*ridageyr=N;

Run;

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Using Bubble Plots

Summarize the data to get counts:

proc freq data=steph.chdrisk noprint;

tables mcq160*ridageryr/out= chdriskplot;

run;

the resulting data set will have a variable COUNT that represents how many records have that combination of the two variables.

Proc gplot data=chdriskplot;

Bubble mcq160*ridageyr=count;

Run;

Super User

## Re: Using Bubble Plots

Summarize the data to get counts:

proc freq data=steph.chdrisk noprint;

tables mcq160*ridageryr/out= chdriskplot;

run;

the resulting data set will have a variable COUNT that represents how many records have that combination of the two variables.

Proc gplot data=chdriskplot;

Bubble mcq160*ridageyr=count;

Run;

Discussion stats