Hi everyone,
Looking for a bit of assistance with a snag I've hit. I have a dataset that contains a number of medical conditions with each person having 0, 1, or more than one conditions. For my top five conditions, I would like to create a side-by-side boxplot. I would also like to have each condition comparing gender (male or female). (Sorta something like what is below):
Because each person could have 0 to 5 of the conditions, it isn't possible for me to set up a column on its own of the medical condition which I believe would solve this issue.
Just wondering if anyone has any thoughts.
Here is an example of what my data could look like
ID C_1 C_2 C_3 C_4 C_5 Gender
1 1 0 1 0 0 Male
2 0 0 0 1 0 Female
3 0 0 1 0 0 Female
4 1 0 0 0 1 Male
5 0 0 0 1 0 Male
6 0 1 0 0 0 Female
7 0 0 0 0 0 Male
Thanks!
So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.
And as @Reeza says, reshape the data:
data have; input ID C_1 C_2 C_3 C_4 C_5 Age Gender $; datalines; 1 1 0 1 0 0 18 Male 2 0 0 0 1 0 9 Female 3 0 0 1 0 0 12 Female 4 1 0 0 0 1 8 Male 5 0 0 0 1 0 7 Male 6 0 1 0 0 0 10 Female 7 0 0 0 0 0 18 Male ; data toplot; set have; array c(*) c_1 - c_5; do i=1 to dim(c); if c[i]=1 then do; cat=vname(c[i]); output; end; end; keep id age gender cat; run; Proc sgplot data=toplot; vbox age /category=cat group=gender; run;
Where is category "A" "B" "C" come from ?
Calculated these percent value firstly . then using PROC SGPLOT .
proc sgplot data=sashelp.heart; vbox weight/category=bp_status group=sex; keylegend /location=inside position=topright across=1 title=''; run;
You haven't described what you want to plot very well. Which "condition" is to be plotted?
Box plots are designed to show the distribution of values. You have dichotomous values, so a box plot of those isn't going to be very helpful directly.
@agille05 wrote:
Yes, my apologies (wrote this after a very long day and right before bed). The variable I will be plotting against is age.
ID C_1 C_2 C_3 C_4 C_5 Age Gender
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
Thank you!
Still, what do you want to plot "against age"?
In a vertical box plot the box upper and lower values are the 3rd and 1st quartiles. When you have a dichotomous variable with exactly two values 1 and 0, the "3rd quartile" will be "1st quartile" and the first will be 0. So any plot of C_1 or such will be pretty much the same: a box one unit high so that makes no sense.
If you want to use C_1 as a category and plot age, then you get two boxes showing the distribution of the ages, one for each level of C_1, which could be grouped by Gender.
Here's how to provided example data in the form of data step code and two possibilities for plotting. If you want ALL of C_1 and C_2 in the same graph then the data may require some reshaping
data have; input ID C_1 C_2 C_3 C_4 C_5 Age Gender $; datalines; 1 1 0 1 0 0 18 Male 2 0 0 0 1 0 9 Female 3 0 0 1 0 0 12 Female 4 1 0 0 0 1 8 Male 5 0 0 0 1 0 7 Male 6 0 1 0 0 0 10 Female 7 0 0 0 0 0 18 Male ; Proc sgplot data=have; title "Plot of C_1 grouped by gender"; vbox c_1 /group=gender; run; Proc sgplot data=have; title "Plot of Age grouped by C_1 and gender "; vbox age /category=c_1 group=gender; run;
So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.
And as @Reeza says, reshape the data:
data have; input ID C_1 C_2 C_3 C_4 C_5 Age Gender $; datalines; 1 1 0 1 0 0 18 Male 2 0 0 0 1 0 9 Female 3 0 0 1 0 0 12 Female 4 1 0 0 0 1 8 Male 5 0 0 0 1 0 7 Male 6 0 1 0 0 0 10 Female 7 0 0 0 0 0 18 Male ; data toplot; set have; array c(*) c_1 - c_5; do i=1 to dim(c); if c[i]=1 then do; cat=vname(c[i]); output; end; end; keep id age gender cat; run; Proc sgplot data=toplot; vbox age /category=cat group=gender; run;
Like butter. You are magic.
Thanks so much for your help and apologies again for not being clearer (second career and I imagine this would have been easier for me to grasp with a younger more nimble brain).
Much appreciated!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.