Fluorite | Level 6

## Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Variable

Hi everyone,

Looking for a bit of assistance with a snag I've hit. I have a dataset that contains a number of medical conditions with each person having 0, 1, or more than one conditions. For my top five conditions, I would like to create a side-by-side boxplot. I would also like to have each condition comparing gender (male or female). (Sorta something like what is below):

Because each person could have 0 to 5 of the conditions, it isn't possible for me to set up a column on its own of the medical condition which I believe would solve this issue.

Just wondering if anyone has any thoughts.

Here is an example of what my data could look like

ID  C_1  C_2  C_3  C_4  C_5 Gender

1       1      0      1       0       0     Male

2       0      0      0       1       0     Female

3       0      0      1       0       0     Female

4       1      0      0       0       1     Male

5       0      0      0       1       0     Male

6       0      1      0       0       0     Female

7       0      0      0       0       0     Male

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.

And as @Reeza says, reshape the data:

```data have;
input ID C_1 C_2 C_3 C_4 C_5 Age Gender \$;
datalines;
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
;

data toplot;
set have;
array c(*) c_1 - c_5;
do i=1 to dim(c);
if c[i]=1 then do;
cat=vname(c[i]);
output;
end;
end;
keep id age gender cat;
run;

Proc sgplot data=toplot;
vbox age /category=cat group=gender;
run;```
9 REPLIES 9
Super User

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

Where is category "A" "B" "C" come from ?

Calculated these percent value firstly . then using PROC SGPLOT .

```proc sgplot data=sashelp.heart;
vbox weight/category=bp_status group=sex;
keylegend /location=inside position=topright across=1 title='';
run;```

Fluorite | Level 6

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

Thank you for your reply. My apologies for leaving this out. I will be plotting against age.

The C1, C2 etc. are the presence or absence of medication conditions.

Thank you!
Super User

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

You haven't described what you want to plot very well. Which "condition" is to be plotted?

Box plots are designed to show the distribution of values. You have dichotomous values, so a box plot of those isn't going to be very helpful directly.

Fluorite | Level 6

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

Yes, my apologies (wrote this after a very long day and right before bed). The variable I will be plotting against is age.

ID C_1 C_2 C_3 C_4 C_5 Age Gender
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male

Thank you!
Super User

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

@agille05 wrote:
Yes, my apologies (wrote this after a very long day and right before bed). The variable I will be plotting against is age.

ID C_1 C_2 C_3 C_4 C_5 Age Gender
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male

Thank you!

Still, what do you want to plot "against age"?

In a vertical box plot the box upper and lower values are the 3rd and 1st quartiles. When you have a dichotomous variable with exactly two values 1 and 0, the "3rd quartile" will be "1st quartile" and the first will be 0. So any plot of C_1 or such will be pretty much the same: a box one unit high so that makes no sense.

If you want to use C_1 as a category and plot age, then you get two boxes showing the distribution of the ages, one for each level of C_1, which could be grouped by Gender.

Here's how to provided example data in the form of data step code and two possibilities for plotting. If you want ALL of C_1 and C_2 in the same graph then the data may require some reshaping

```data have;
input ID C_1 C_2 C_3 C_4 C_5 Age Gender \$;
datalines;
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
;

Proc sgplot data=have;
title "Plot of C_1 grouped by gender";
vbox c_1 /group=gender;
run;

Proc sgplot data=have;
title "Plot of Age grouped by C_1 and gender ";
vbox age /category=c_1 group=gender;
run;```

Fluorite | Level 6

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

Thank you! Yes, that would be helpful to have. I'm plotting the age distribution for the presence of each medical condition (the C1 to C5). What I would like is to have is each of the 5 conditions plotted (group by gender). Here is the code I have for one condition, but can't seem to get multiples side-by-side owing to the possibility that some participants can have more than one medical condition. Have also tried SGPLOT to no avail.

Proc sgpanel Data=Peds;
Panelby Chronic_Respiratory_Diseases;
Where Chronic_Respiratory_Diseases=1;
Vbox Age / Category=Gender Group=Gender;
Run;
Thanks again!
Super User

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.

And as @Reeza says, reshape the data:

```data have;
input ID C_1 C_2 C_3 C_4 C_5 Age Gender \$;
datalines;
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
;

data toplot;
set have;
array c(*) c_1 - c_5;
do i=1 to dim(c);
if c[i]=1 then do;
cat=vname(c[i]);
output;
end;
end;
keep id age gender cat;
run;

Proc sgplot data=toplot;
vbox age /category=cat group=gender;
run;```
Fluorite | Level 6

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

Like butter. You are magic.

Thanks so much for your help and apologies again for not being clearer (second career and I imagine this would have been easier for me to grasp with a younger more nimble brain).

Much appreciated!

Super User

## Re: Attempting to Develop a Side-By-Side Boxplot for Multiple Conditions and per a Categorical Varia

You need to restructure your data set and then use the new variable as a grouping variable.