Based on the summary statistics using, the mode is 0. In that case, shouldn't the histogram reflect 0 as the highest frequency?
*summary stats;
proc means data= libref.data2 mean std var min max n median mode;
run;
*histogram;
proc univariate data= libref.data2;
histogram;
run;
Hello @karenktq,
These kind of "hairy" histograms typically occur when discrete data are binned inappropriately. In your example the default binning was inappropriate. I assume that your popularity values are integers. Note that apparently six consecutive values went into five bins. Thus, one of the five bins necessarily contains two values, not one as the other four, and therefore "stands out" in the histogram, easily rising above the true mode.
Here's an example demonstrating the issue and showing how the MIDPOINTS= option can be used to obtain appropriate (or, if misused, inappropriate) binning:
data test;
call streaminit(314159);
do _n_=1 to 50000;
x=rand('poisson',160)-107;
output;
end;
run;
ods graphics on;
proc univariate data=test;
histogram / midpoints=0 to 100 by 1.2 odstitle='Inappropriate Binning';
histogram / midpoints=0 to 100 odstitle='Appropriate Binning';
run;
Results:
It may be (in fact I'm pretty sure) that 0 is the mode, but because of the way the bins are grouped together for plotting it doesn't look like 0 is the mode in the plot. In other words, the plot is misleading you. However, there are options in the HISTOGRAM statement that let you change the way the bins are grouped.
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/procstat/procstat_univariate_syntax09.htm
Hello @karenktq,
These kind of "hairy" histograms typically occur when discrete data are binned inappropriately. In your example the default binning was inappropriate. I assume that your popularity values are integers. Note that apparently six consecutive values went into five bins. Thus, one of the five bins necessarily contains two values, not one as the other four, and therefore "stands out" in the histogram, easily rising above the true mode.
Here's an example demonstrating the issue and showing how the MIDPOINTS= option can be used to obtain appropriate (or, if misused, inappropriate) binning:
data test;
call streaminit(314159);
do _n_=1 to 50000;
x=rand('poisson',160)-107;
output;
end;
run;
ods graphics on;
proc univariate data=test;
histogram / midpoints=0 to 100 by 1.2 odstitle='Inappropriate Binning';
histogram / midpoints=0 to 100 odstitle='Appropriate Binning';
run;
Results:
You ignore the important element : the width of BIN .
Change it by ENDPOINT= option , you would get different result.
ods select histogram; proc univariate data=score_card ; var total_score; histogram total_score/ kernel endpoints=(490 to 650 by 10) ; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.