BookmarkSubscribeRSS Feed
michal_1407
Fluorite | Level 6
Hi,
 
I need your help.
 
I want to understand how PROC UNIVARIATE decides how many bins are required?
 
I have the following code:
 
ods output HistogramBins = prefix_th; 
proc univariate data = datain noprint; 
histogram y / vscale = percent  MIDPERCENTS;
run;
 
and I see depending on the data I have different numbers of bins. 
 
I found the original paper: https://www.jstor.org/stable/2288074 and I use the following approach:
width = 3.5 * σ * n^(-1/3)
nbins = ceil( (max - min) / width )
 
but still I have different number of bins.
 
Can you help?
 
from support:
 
ENDPOINTS <=values |KEY |UNIFORM>
uses histogram bin endpoints as the tick mark values for the horizontal axis and determines how to compute the bin width of the histogram bars. You can specify the following values:
values specifies both the left and right endpoints of each histogram interval. The width of the histogram bars is the difference between consecutive endpoints. The procedure uses the same values for all variables.

KEY
determines the endpoints for the data in the key cell. The initial number of endpoints is based on the number of observations in the key cell by using the method of Terrell and Scott (1985). The procedure extends the endpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.
UNIFORM
determines the endpoints by using all the observations as if there were no cells. In other words, the number of endpoints is based on the total sample size by using the method of Terrell and Scott (1985).
4 REPLIES 4
Tom
Super User Tom
Super User

If you just want a HISTOGRAM why are you running PROC UNIVARIATE instead of the appropriate graphics procedure, like PROC SGPLOT with the HISTOGRAM statement?

michal_1407
Fluorite | Level 6
My goal is to get the HistogramBins table.
Based on this table the histogram is generated. I edited my post, sorry for my mistake
Tom
Super User Tom
Super User

What do you mean by "table"?

Sounds like you want to use ODS OUTPUT to convert this TABLE (tabular report) in the output of PROC UNIVARIABLE 

Tom_0-1758636354002.png

into a DATASET?

 

And what is your question or your goal?

Do you want to know if there is a way to change PROC UNIVARIATE so that it produces a different number of bins?

Do you want to understand how PROC UNIVARIATE decides how many bins are required?

michal_1407
Fluorite | Level 6
Hi,

I want to understand how PROC UNIVARIATE decides how many bins are required.

Only this

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 100 views
  • 0 likes
  • 2 in conversation