- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
Looking for a bit of assistance with a snag I've hit. I have a dataset that contains a number of medical conditions with each person having 0, 1, or more than one conditions. For my top five conditions, I would like to create a side-by-side boxplot. I would also like to have each condition comparing gender (male or female). (Sorta something like what is below):
Because each person could have 0 to 5 of the conditions, it isn't possible for me to set up a column on its own of the medical condition which I believe would solve this issue.
Just wondering if anyone has any thoughts.
Here is an example of what my data could look like
ID C_1 C_2 C_3 C_4 C_5 Gender
1 1 0 1 0 0 Male
2 0 0 0 1 0 Female
3 0 0 1 0 0 Female
4 1 0 0 0 1 Male
5 0 0 0 1 0 Male
6 0 1 0 0 0 Female
7 0 0 0 0 0 Male
Thanks!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.
And as @Reeza says, reshape the data:
data have; input ID C_1 C_2 C_3 C_4 C_5 Age Gender $; datalines; 1 1 0 1 0 0 18 Male 2 0 0 0 1 0 9 Female 3 0 0 1 0 0 12 Female 4 1 0 0 0 1 8 Male 5 0 0 0 1 0 7 Male 6 0 1 0 0 0 10 Female 7 0 0 0 0 0 18 Male ; data toplot; set have; array c(*) c_1 - c_5; do i=1 to dim(c); if c[i]=1 then do; cat=vname(c[i]); output; end; end; keep id age gender cat; run; Proc sgplot data=toplot; vbox age /category=cat group=gender; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Where is category "A" "B" "C" come from ?
Calculated these percent value firstly . then using PROC SGPLOT .
proc sgplot data=sashelp.heart; vbox weight/category=bp_status group=sex; keylegend /location=inside position=topright across=1 title=''; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The C1, C2 etc. are the presence or absence of medication conditions.
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You haven't described what you want to plot very well. Which "condition" is to be plotted?
Box plots are designed to show the distribution of values. You have dichotomous values, so a box plot of those isn't going to be very helpful directly.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ID C_1 C_2 C_3 C_4 C_5 Age Gender
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@agille05 wrote:
Yes, my apologies (wrote this after a very long day and right before bed). The variable I will be plotting against is age.
ID C_1 C_2 C_3 C_4 C_5 Age Gender
1 1 0 1 0 0 18 Male
2 0 0 0 1 0 9 Female
3 0 0 1 0 0 12 Female
4 1 0 0 0 1 8 Male
5 0 0 0 1 0 7 Male
6 0 1 0 0 0 10 Female
7 0 0 0 0 0 18 Male
Thank you!
Still, what do you want to plot "against age"?
In a vertical box plot the box upper and lower values are the 3rd and 1st quartiles. When you have a dichotomous variable with exactly two values 1 and 0, the "3rd quartile" will be "1st quartile" and the first will be 0. So any plot of C_1 or such will be pretty much the same: a box one unit high so that makes no sense.
If you want to use C_1 as a category and plot age, then you get two boxes showing the distribution of the ages, one for each level of C_1, which could be grouped by Gender.
Here's how to provided example data in the form of data step code and two possibilities for plotting. If you want ALL of C_1 and C_2 in the same graph then the data may require some reshaping
data have; input ID C_1 C_2 C_3 C_4 C_5 Age Gender $; datalines; 1 1 0 1 0 0 18 Male 2 0 0 0 1 0 9 Female 3 0 0 1 0 0 12 Female 4 1 0 0 0 1 8 Male 5 0 0 0 1 0 7 Male 6 0 1 0 0 0 10 Female 7 0 0 0 0 0 18 Male ; Proc sgplot data=have; title "Plot of C_1 grouped by gender"; vbox c_1 /group=gender; run; Proc sgplot data=have; title "Plot of Age grouped by C_1 and gender "; vbox age /category=c_1 group=gender; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Proc sgpanel Data=Peds;
Panelby Chronic_Respiratory_Diseases;
Where Chronic_Respiratory_Diseases=1;
Vbox Age / Category=Gender Group=Gender;
Run;
Thanks again!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you want to to a box plot of AGE for the C variables when they are equal to 1 grouped by age? That bit about how to use the C variables was a major missing part in the first description.
And as @Reeza says, reshape the data:
data have; input ID C_1 C_2 C_3 C_4 C_5 Age Gender $; datalines; 1 1 0 1 0 0 18 Male 2 0 0 0 1 0 9 Female 3 0 0 1 0 0 12 Female 4 1 0 0 0 1 8 Male 5 0 0 0 1 0 7 Male 6 0 1 0 0 0 10 Female 7 0 0 0 0 0 18 Male ; data toplot; set have; array c(*) c_1 - c_5; do i=1 to dim(c); if c[i]=1 then do; cat=vname(c[i]); output; end; end; keep id age gender cat; run; Proc sgplot data=toplot; vbox age /category=cat group=gender; run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Like butter. You are magic.
Thanks so much for your help and apologies again for not being clearer (second career and I imagine this would have been easier for me to grasp with a younger more nimble brain).
Much appreciated!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You want your data as:
ID Condition# Age Gender
1 1 18 Male
1 3 18 Male
2 4 23 Female
....
This your starting point.