I have been tearing my hair out with frustration for the last hour. I just want a histogram for my age distribution, in bins of 1 bar per age category (age in years, no decimals in the data), showing values 0, 5, 10 etc in the x-axis.
I am working from Enterprise Guide.
I cannot believe how many different procs there are to make histograms, each with different syntax. There is the
proc univairate data=mydate; histogram age; run;
or
proc gchart data=mydata; vbar age /discrete; run;
or
PROC SGPLOT DATA = mydata; HISTOGRAM age/binwidth=1 binstart=0; TITLE "Age"; xaxis values=(0 to 100 by 1); RUN;
etc.
The closest I have come is with the SGPLOT, but the first bar is halvway hidden behind the Y axis. Does anyone know how to make the first bar visible?
Also, which command would you use to generate a simple histogram like this? Is there something like a best practice? Are some of these older procs getting faded out?
You're correct that there are many ways to create a histogram -- you touched only on a few of them!
I think the simplest method is the one you tried last: PROC SGPLOT. Since you have 100 bins and the graph is only so wide, the algorithm to make everything fit might make your left-most extreme value seem very tight against the axis. You can use the OFFSETMIN= option to give a little more space. Try this:
data sample (keep=age);
do i = 1 to 100000;
age = abs ( floor ( rand('triangle',0.1) * 100 ) );
output;
end;
run;
ods graphics / width=1000px height=400px;
proc sgplot data=sample;
HISTOGRAM age / binwidth=1 binstart=0 ;
TITLE "Age";
xaxis values=(0 to 100 by 1) offsetmin=.01 offsetmax=.01 ;
RUN;
You're correct that there are many ways to create a histogram -- you touched only on a few of them!
I think the simplest method is the one you tried last: PROC SGPLOT. Since you have 100 bins and the graph is only so wide, the algorithm to make everything fit might make your left-most extreme value seem very tight against the axis. You can use the OFFSETMIN= option to give a little more space. Try this:
data sample (keep=age);
do i = 1 to 100000;
age = abs ( floor ( rand('triangle',0.1) * 100 ) );
output;
end;
run;
ods graphics / width=1000px height=400px;
proc sgplot data=sample;
HISTOGRAM age / binwidth=1 binstart=0 ;
TITLE "Age";
xaxis values=(0 to 100 by 1) offsetmin=.01 offsetmax=.01 ;
RUN;
Thank you! Don't think I would have found that sollution on my own.
Is there a simple way to split this chart by say gender? So that I get a stacked histogram?
Of course! Again, multple methods, but now it sounds like you're more interested in a VBAR with a grouping variable than a classic histogram of statistical distribution.
You could try PROC SGPANEL with a HISTOGRAM statement and PANELBY for gender (that would yield two histograms). Or you could use PROC FREQ to calc the percentages into a data set, then use a step like:
proc sgplot data=freq_output;
vbar age / response=percent group=gender grouporder=data;
run;
Building on Chris' example, here are three variations you do with your grouping variable
proc format;
value gender 1="Male"
2="Female"
;
run;
data sample (keep=age g);
do g = 1 to 2;
do i = 1 to 100000;
age = abs ( floor ( rand('triangle',0.1) * 100 ) );
output;
end;
end;
run;
ods graphics / width=1000px height=400px;
proc sgplot data=sample;
format g gender.;
by g;
HISTOGRAM age / binwidth=1 binstart=0 ;
TITLE "Age";
xaxis values=(0 to 100 by 1) offsetmin=.01 offsetmax=.01 ;
RUN;
proc sgplot data=sample;
format g gender.;
HISTOGRAM age / binwidth=1 binstart=0 group=g transparency=0.5;
TITLE "Age";
xaxis values=(0 to 100 by 1) offsetmin=.01 offsetmax=.01 ;
RUN;
proc sgpanel data=sample;
format g gender.;
panelby g / layout=rowlattice novarname;
HISTOGRAM age / binwidth=1 binstart=0 ;
TITLE "Age";
colaxis values=(0 to 100 by 1) offsetmin=.01 offsetmax=.01 ;
RUN;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.