BookmarkSubscribeRSS Feed
Percentile95
Calcite | Level 5

Dear SAS-VA Users,

 

I am currently developing reports that include statistical distribution analyses within the field of clinical biochemistry. A key feature of these reports is a histogram that displays the distribution of test results. Users can interactively filter the test results using several parameters, and the histogram updates accordingly with the new data.

 

However, I have encountered a problem: the histograms often display irregular spikes or other types of "artifacts" (please see the image below). I suspect these artifacts arise from the SAS-VA algorithm used for determining bin widths, which leads to these discrepancies. These artifacts undermine the users' confidence in the validity of the distribution analysis and the accompanying calculations. Additionally, SAS-VA provides limited options for adjusting the bin size or range. This issue does not occur with other statistical software I have used.

 

I would greatly appreciate any suggestions for resolving or minimizing this problem.

 

I have attached a data file with a single variable containing 64,151 test results. Creating a SAS-VA histogram with these data results in a spiked histogram, as shown below.

 

Best regards Percentile95Bad_histogram.png

4 REPLIES 4
Stu_SAS
SAS Employee

Hey @Percentile95! First, thank you so much for supplying sample data. This makes working on a solution much easier!

I found that for this particular set of data, if you set the bin width to be 50 you get a smooth distribution as expected:

 

Stu_SAS_0-1717681024621.png

 

When comparing this with SGPLOT, you get the same results as in Visual Analytics with the automatic algorithm:

Stu_SAS_1-1717681212918.png

 

proc sgplot data=a.histogram_bin_problem;
    histogram internal_reply_num / scale=count binwidth=0.02;
run;

Stu_SAS_2-1717681679271.png

 

This is because they use the same auto-binning algorithm under the hood. In this case I would recommend choosing a number of bins that helps generate a smoother distribution.

 

Percentile95
Calcite | Level 5

Thanks for the fast reply, much appreciated.

 

I understand your suggestion, but the SAS-VA report interface allows users to adjust which numbers go into the histogram.

 

As you showed with SGPLOT, using the automatic algorithm results in a spiked histogram. Adjusting to:

 

proc sgplot data=a.histogram_bin_problem;
    histogram internal_reply_num / scale=count binwidth=0.02;
run

 

Gives a nice smooth looking histogram. 

 

If the user then adust the input (albeit in SAS-VA):

proc sgplot data=histogram_bin_problem;
    histogram internal_reply_num / scale=count binwidth=0.02;
    where internal_reply_num between 2 and 2.8;
run;

The output is again get a "bad" looking histogram:

Percentile95_0-1717683301963.png

 

So the input to the Histogram function (SGPLOT or SAS-VA) is dynamic, and I'm hoping for a better/different auto-binning algorithm. I have tried, as suggested by you, to use the number of bins that gives a smooth histogram, but then the input changes and I get a spiked histogram. I have attempted to use a parameter inside "Number of bins" to let the user adjust the look of the histogram, but parameters are not allowed as input.

 

Hope thit makes sense

Stu_SAS
SAS Employee
Thanks, @Percentile95. I agree with your suggestion about allowing users to adjust the number of bins with a parameter. We recently released Dynamic Parameters in Visual Analytics and are planning on adding more places where you can add dynamic values. The histogram number of bins sounds like a fantastic place. I'll bring this to R&D for their thoughts.
ballardw
Super User

First a caveat or two. I don't have access to VA so am not sure if this suggest can be implemented.

Second, it takes a bit more training on the part of individuals reading but BOXPLOTS can contain a lot of distribution information and are not subject to "bin width" issues. Outlier definitions and displays have some issues but I suspect may be easier to deal with.  So perhaps consider box plots until this alternate parameterization of histograms is available.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 431 views
  • 1 like
  • 3 in conversation