BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
agille05
Fluorite | Level 6

Hi everyone, 

 

Looking for a bit of assistance with a snag I've hit. I have a dataset that contains a number of medical conditions with each person having 0, 1, or more than one conditions. For my top five conditions, I would like to create a side-by-side boxplot. I would also like to have each condition comparing gender (male or female). (Sorta something like what is below):

7eMcC.png

Because each person could have 0 to 5 of the conditions, it isn't possible for me to set up a column on its own of the medical condition which I believe would solve this issue. 

 

Just wondering if anyone has any thoughts. 

 

Here is an example of what my data could look like

 

ID  C_1  C_2  C_3  C_4  C_5 Gender

1       1      0      1       0       0     Male

2       0      0      0       1       0     Female

3       0      0      1       0       0     Female

4       1      0      0       0       1     Male

5       0      0      0       1       0     Male

6       0      1      0       0       0     Female

7       0      0      0       0       0     Male

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.

 

And as @Reeza says, reshape the data:

data have;
  input ID C_1 C_2 C_3 C_4 C_5 Age Gender $;
datalines;
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
;

data toplot;
  set have;
  array c(*) c_1 - c_5;
  do i=1 to dim(c);
     if c[i]=1 then do;
        cat=vname(c[i]);
        output;
     end;
  end;
  keep id age gender cat;
run;

Proc sgplot data=toplot;
   vbox age /category=cat group=gender;
run;

View solution in original post

9 REPLIES 9
Ksharp
Super User

Where is category "A" "B" "C" come from ?

Calculated these percent value firstly . then using PROC SGPLOT .

 

proc sgplot data=sashelp.heart;
vbox weight/category=bp_status group=sex;
keylegend /location=inside position=topright across=1 title='';
run;

Ksharp_0-1654408248142.png

 

agille05
Fluorite | Level 6
Thank you for your reply. My apologies for leaving this out. I will be plotting against age.

The C1, C2 etc. are the presence or absence of medication conditions.

Thank you!
ballardw
Super User

You haven't described what you want to plot very well. Which "condition" is to be plotted?

Box plots are designed to show the distribution of values. You have dichotomous values, so a box plot of those isn't going to be very helpful directly.

agille05
Fluorite | Level 6
Yes, my apologies (wrote this after a very long day and right before bed). The variable I will be plotting against is age.

ID C_1 C_2 C_3 C_4 C_5 Age Gender
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male

Thank you!
ballardw
Super User

@agille05 wrote:
Yes, my apologies (wrote this after a very long day and right before bed). The variable I will be plotting against is age.

ID C_1 C_2 C_3 C_4 C_5 Age Gender
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male

Thank you!

Still, what do you want to plot "against age"?

In a vertical box plot the box upper and lower values are the 3rd and 1st quartiles. When you have a dichotomous variable with exactly two values 1 and 0, the "3rd quartile" will be "1st quartile" and the first will be 0. So any plot of C_1 or such will be pretty much the same: a box one unit high so that makes no sense.

If you want to use C_1 as a category and plot age, then you get two boxes showing the distribution of the ages, one for each level of C_1, which could be grouped by Gender.

 

Here's how to provided example data in the form of data step code and two possibilities for plotting. If you want ALL of C_1 and C_2 in the same graph then the data may require some reshaping

data have;
  input ID C_1 C_2 C_3 C_4 C_5 Age Gender $;
datalines;
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
;

Proc sgplot data=have;
   title "Plot of C_1 grouped by gender";
   vbox c_1 /group=gender;
run;

Proc sgplot data=have;
   title "Plot of Age grouped by C_1 and gender ";
   vbox age /category=c_1 group=gender;
run;

 

agille05
Fluorite | Level 6
Thank you! Yes, that would be helpful to have. I'm plotting the age distribution for the presence of each medical condition (the C1 to C5). What I would like is to have is each of the 5 conditions plotted (group by gender). Here is the code I have for one condition, but can't seem to get multiples side-by-side owing to the possibility that some participants can have more than one medical condition. Have also tried SGPLOT to no avail.

Proc sgpanel Data=Peds;
Panelby Chronic_Respiratory_Diseases;
Where Chronic_Respiratory_Diseases=1;
Vbox Age / Category=Gender Group=Gender;
Run;
Thanks again!
ballardw
Super User

So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.

 

And as @Reeza says, reshape the data:

data have;
  input ID C_1 C_2 C_3 C_4 C_5 Age Gender $;
datalines;
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
;

data toplot;
  set have;
  array c(*) c_1 - c_5;
  do i=1 to dim(c);
     if c[i]=1 then do;
        cat=vname(c[i]);
        output;
     end;
  end;
  keep id age gender cat;
run;

Proc sgplot data=toplot;
   vbox age /category=cat group=gender;
run;
agille05
Fluorite | Level 6

Like butter. You are magic. 

 

Thanks so much for your help and apologies again for not being clearer (second career and I imagine this would have been easier for me to grasp with a younger more nimble brain).

 

Much appreciated!

Reeza
Super User
You need to restructure your data set and then use the new variable as a grouping variable.

You want your data as:

ID Condition# Age Gender
1 1 18 Male
1 3 18 Male
2 4 23 Female
....

This your starting point.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2835 views
  • 5 likes
  • 4 in conversation