I am trying to achieve the Histogram like in the image ( sorry for my artistic abilities). Is it possible? Thank you for your input. In the following data, I want the count/ percentage based on the 'SEPALLENGTH' on the Y axis and X- the axis should have the 'cat.' ( 1 in image) If not, then the customized scale ( like how it was done in the 'Univariate" code). with the histogram filled ( labeled with 'CAT') (2 in image). thank you for your help and suggestions.
data iris;
set sashelp.iris;
if 40<=SepalLength<45 then cat = "40<= SL <45";
else if 45<=SepalLength<50 then cat = "45<= SL <50";
else if 50<=SepalLength<55 then cat = "50<= SL <55";
else if 55<=SepalLength<60 then cat = "55<= SL <60";
else if 60<=SepalLength<65 then cat = "60<= SL <65";
else if 65<=SepalLength<70 then cat = "65<= SL <70";
else if 70<=SepalLength<75 then cat = "70<= SL <75";
else if sepallength >= 75 then cat = ">= 75";
run;
** graphs by univariate**;
proc univariate data=iris;
class species;
histogram sepallength / normal(color=blue)
ctext = blue
midpoints = 40 to 80 by 5;
INSET N = 'Count' MEDIAN (8.2) MEAN (8.2) STD = 'Standard Deviation' (8.3)/ POSITION = ne; ;
run;
**************************;
**graphs by SGPLOT **;
proc means data=iris noprint;
class species;
var SepalLength;
output out=meanval mean=;
ways 1;
run;
data _null_;
set meanval;
if species = 'Setosa' then
call symput("SE_Mean", put(SepalLength, best6.));
if species = 'Versicolor' then
call symput("Ve_MEAN", put(SepalLength, best6.));
if species = 'Virginica' then
call symput("vi_MEAN", put(SepalLength, best6.));
run;
proc sgplot data=iris;
by species;
histogram SepalLength / group = species ;
inset ( "Setosa"="&SE_Mean"
"Versicolor"="&Ve_MEAN"
'Virginica'= "&vi_MEAN") / border title=" Species";
run;
Since your hand-drawn histogram does not show a "fit curve" are you sure that you want a histogram?
The more you need to control width of bars then perhaps HISTOGRAM plot isn't really what you want.
Please consider this example, using VBAR and a FORMAT to control bar widths, and apply an axis label for the category.
The options barwidth=1 suppresses any space between adjacent categories.
The XAXIS values statement forces all of the xaxis to be the same so your graphs show the same range of values.
The Format statement then uses the format to 1) create groups of values of Sepallength variable and 2) label the axis.
proc format ; value sepalcat 40-<45 = "40<= SL <45" 45-<50 = "45<= SL <50" 50-<55 = "50<= SL <55" 55-<60 = "55<= SL <60" 60-<65 = "60<= SL <65" 65-<70 = "65<= SL <70" 70-<75 = "70<= SL <75" 75-high = ">= 75" ; run; data iris; set sashelp.iris; if 40<=SepalLength<45 then cat = "40<= SL <45"; else if 45<=SepalLength<50 then cat = "45<= SL <50"; else if 50<=SepalLength<55 then cat = "50<= SL <55"; else if 55<=SepalLength<60 then cat = "55<= SL <60"; else if 60<=SepalLength<65 then cat = "60<= SL <65"; else if 65<=SepalLength<70 then cat = "65<= SL <70"; else if 70<=SepalLength<75 then cat = "70<= SL <75"; else if sepallength >= 75 then cat = ">= 75"; run; ** graphs by univariate**; proc univariate data=iris; class species; histogram sepallength / normal(color=blue) ctext = blue midpoints = 40 to 80 by 5; INSET N = 'Count' MEDIAN (8.2) MEAN (8.2) STD = 'Standard Deviation' (8.3)/ POSITION = ne; ; run; proc format ; value sepalcat 40-<45 = "40<= SL <45" 45-<50 = "45<= SL <50" 50-<55 = "50<= SL <55" 55-<60 = "55<= SL <60" 60-<65 = "60<= SL <65" 65-<70 = "65<= SL <70" 70-<75 = "70<= SL <75" 75-high = ">= 75" ; run; proc sgplot data=iris; by species; vbar sepallength /group=species stat=percent barwidth=1; format sepallength sepalcat.; xaxis values=(42.5 to 77.5 by 5); run;
You will learn that creating character valued variables can create problems with graphing and reporting as often the default displays will be in formatted value order and not match the underlying numeric values causing some confusion or occasionally convoluted code to get the natural intended order to display.
BTW, you do realize that your univariate scale of midpoint 40 to 80 by 5 does not match the categories you created, don't you? Your category is using 42.5, 47.5 etc as midpoints, not 40, 45, 50 ....
Since your hand-drawn histogram does not show a "fit curve" are you sure that you want a histogram?
The more you need to control width of bars then perhaps HISTOGRAM plot isn't really what you want.
Please consider this example, using VBAR and a FORMAT to control bar widths, and apply an axis label for the category.
The options barwidth=1 suppresses any space between adjacent categories.
The XAXIS values statement forces all of the xaxis to be the same so your graphs show the same range of values.
The Format statement then uses the format to 1) create groups of values of Sepallength variable and 2) label the axis.
proc format ; value sepalcat 40-<45 = "40<= SL <45" 45-<50 = "45<= SL <50" 50-<55 = "50<= SL <55" 55-<60 = "55<= SL <60" 60-<65 = "60<= SL <65" 65-<70 = "65<= SL <70" 70-<75 = "70<= SL <75" 75-high = ">= 75" ; run; data iris; set sashelp.iris; if 40<=SepalLength<45 then cat = "40<= SL <45"; else if 45<=SepalLength<50 then cat = "45<= SL <50"; else if 50<=SepalLength<55 then cat = "50<= SL <55"; else if 55<=SepalLength<60 then cat = "55<= SL <60"; else if 60<=SepalLength<65 then cat = "60<= SL <65"; else if 65<=SepalLength<70 then cat = "65<= SL <70"; else if 70<=SepalLength<75 then cat = "70<= SL <75"; else if sepallength >= 75 then cat = ">= 75"; run; ** graphs by univariate**; proc univariate data=iris; class species; histogram sepallength / normal(color=blue) ctext = blue midpoints = 40 to 80 by 5; INSET N = 'Count' MEDIAN (8.2) MEAN (8.2) STD = 'Standard Deviation' (8.3)/ POSITION = ne; ; run; proc format ; value sepalcat 40-<45 = "40<= SL <45" 45-<50 = "45<= SL <50" 50-<55 = "50<= SL <55" 55-<60 = "55<= SL <60" 60-<65 = "60<= SL <65" 65-<70 = "65<= SL <70" 70-<75 = "70<= SL <75" 75-high = ">= 75" ; run; proc sgplot data=iris; by species; vbar sepallength /group=species stat=percent barwidth=1; format sepallength sepalcat.; xaxis values=(42.5 to 77.5 by 5); run;
You will learn that creating character valued variables can create problems with graphing and reporting as often the default displays will be in formatted value order and not match the underlying numeric values causing some confusion or occasionally convoluted code to get the natural intended order to display.
BTW, you do realize that your univariate scale of midpoint 40 to 80 by 5 does not match the categories you created, don't you? Your category is using 42.5, 47.5 etc as midpoints, not 40, 45, 50 ....
Thank you @ballardw
1. I realized the univariate scale once I posted the question. Saw the note in the log
2. My requirement was to display the categorical how the range of values we gave on the x-axis label. For now, I think it will be ok. Thank you for the code.
3. Can the fit curve be generated to show over the bars, like how we get the fit curve with univariates?
4. I tried to use the 'fitpolicy' option, but I am getting an angle ~120 angles. But I want to get the opposite direction/angle ~240. I tried to rotate the option but did not work. Is it possible?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.