<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Guidance on avoiding histograms that display spikes and other artifacts. in SAS Visual Analytics</title>
    <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934821#M18120</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;SPAN&gt;To make this visual illusion disappear, use a bin width that is at least as large as the rounding unit in the data.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;If the bin width is larger than the rounding unit, it should be an &lt;EM&gt;integer multiple&lt;/EM&gt; of the rounding unit. Otherwise, you can still get those spikes in the histogram, as was discussed in the 2021 thread&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Studio/Histogram-does-not-reflect-summary-statistics/m-p/746694/highlight/true#M10010" target="_blank" rel="noopener"&gt;Histogram does not reflect summary statistics&lt;/A&gt;, where due to the non-integer ratio 1.2 : 1 every fifth histogram bar comprised two values rather than one.&lt;/P&gt;</description>
    <pubDate>Fri, 05 Jul 2024 17:26:13 GMT</pubDate>
    <dc:creator>FreelanceReinh</dc:creator>
    <dc:date>2024-07-05T17:26:13Z</dc:date>
    <item>
      <title>Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931117#M18077</link>
      <description>&lt;P&gt;Dear SAS-VA Users,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I am currently developing reports that include statistical distribution analyses within the field of clinical biochemistry. A key feature of these reports is a histogram that displays the distribution of test results. Users can interactively filter the test results using several parameters, and the histogram updates accordingly with the new data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, I have encountered a problem: the histograms often display irregular spikes or other types of "artifacts" (please see the image below). I suspect these artifacts arise from the SAS-VA algorithm used for determining bin widths, which leads to these discrepancies. These artifacts undermine the users' confidence in the validity of the distribution analysis and the accompanying calculations. Additionally, SAS-VA provides limited options for adjusting the bin size or range. This issue does not occur with other statistical software I have used.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would greatly appreciate any suggestions for resolving or minimizing this problem.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have attached a data file with a single variable containing 64,151 test results. Creating a SAS-VA histogram with these data results in a spiked histogram, as shown below.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Best regards Percentile95&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Bad_histogram.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97063i38E66CDD43B2581A/image-size/large?v=v2&amp;amp;px=999" role="button" title="Bad_histogram.png" alt="Bad_histogram.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2024 12:59:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931117#M18077</guid>
      <dc:creator>Percentile95</dc:creator>
      <dc:date>2024-06-06T12:59:29Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931131#M18078</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159617"&gt;@Percentile95&lt;/a&gt;! First, thank you so much for supplying sample data. This makes working on a solution&amp;nbsp;&lt;EM&gt;much&lt;/EM&gt; easier!&lt;/P&gt;
&lt;P&gt;I found that for this particular set of data, if you set the bin width to be 50 you get a smooth distribution as expected:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Stu_SAS_0-1717681024621.png" style="width: 711px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97068i5A4D9823566CE445/image-dimensions/711x398?v=v2" width="711" height="398" role="button" title="Stu_SAS_0-1717681024621.png" alt="Stu_SAS_0-1717681024621.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When comparing this with SGPLOT, you get the same results as in Visual Analytics with the automatic algorithm:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Stu_SAS_1-1717681212918.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97069i0ED59614030E67D9/image-size/large?v=v2&amp;amp;px=999" role="button" title="Stu_SAS_1-1717681212918.png" alt="Stu_SAS_1-1717681212918.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sgplot data=a.histogram_bin_problem;
    histogram internal_reply_num / scale=count binwidth=0.02;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Stu_SAS_2-1717681679271.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97070i23A734F0F83108FB/image-size/large?v=v2&amp;amp;px=999" role="button" title="Stu_SAS_2-1717681679271.png" alt="Stu_SAS_2-1717681679271.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is because they use the same auto-binning algorithm under the hood. In this case I would recommend choosing a number of bins that helps generate a smoother distribution.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2024 13:48:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931131#M18078</guid>
      <dc:creator>Stu_SAS</dc:creator>
      <dc:date>2024-06-06T13:48:44Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931143#M18079</link>
      <description>&lt;P&gt;Thanks for the fast reply, much appreciated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I understand your suggestion, but the SAS-VA report interface allows users to adjust which numbers go into the histogram.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;As you showed with SGPLOT, using the automatic algorithm results in a spiked histogram. Adjusting to:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;proc sgplot data=a.histogram_bin_problem;
    histogram internal_reply_num / scale=count binwidth=0.02;
run&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Gives a nice smooth looking histogram.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If the user then adust the input (albeit in SAS-VA):&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;proc sgplot data=histogram_bin_problem;
    histogram internal_reply_num / scale=count binwidth=0.02;
   &lt;U&gt; &lt;EM&gt;&lt;STRONG&gt;where internal_reply_num between 2 and 2.8;&lt;/STRONG&gt;&lt;/EM&gt;&lt;/U&gt;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;The output is again get a "bad" looking histogram:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Percentile95_0-1717683301963.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/97073iC8B10756235A8B05/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Percentile95_0-1717683301963.png" alt="Percentile95_0-1717683301963.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So the input to the Histogram function (SGPLOT or SAS-VA) is dynamic, and I'm hoping for a better/different auto-binning algorithm. I have tried, as suggested by you, to use the number of bins that gives a smooth histogram, but then the input changes and I get a spiked histogram. I have attempted to use a parameter inside "Number of bins" to let the user adjust the look of the histogram, but parameters are not allowed as input.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope thit makes sense&lt;/P&gt;</description>
      <pubDate>Thu, 06 Jun 2024 14:31:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931143#M18079</guid>
      <dc:creator>Percentile95</dc:creator>
      <dc:date>2024-06-06T14:31:21Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931145#M18080</link>
      <description>Thanks, &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159617"&gt;@Percentile95&lt;/a&gt;. I agree with your suggestion about allowing users to adjust the number of bins with a parameter. We recently released Dynamic Parameters in Visual Analytics and are planning on adding more places where you can add dynamic values. The histogram number of bins sounds like a fantastic place. I'll bring this to R&amp;amp;D for their thoughts.</description>
      <pubDate>Thu, 06 Jun 2024 14:34:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931145#M18080</guid>
      <dc:creator>Stu_SAS</dc:creator>
      <dc:date>2024-06-06T14:34:42Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931247#M18083</link>
      <description>&lt;P&gt;First a caveat or two. I don't have access to VA so am not sure if this suggest can be implemented.&lt;/P&gt;
&lt;P&gt;Second, it takes a bit more training on the part of individuals reading but BOXPLOTS can contain a lot of distribution information and are not subject to "bin width" issues. Outlier definitions and displays have some issues but I suspect may be easier to deal with.&amp;nbsp; So perhaps consider box plots until this alternate parameterization of histograms is available.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Jun 2024 14:48:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/931247#M18083</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-06-07T14:48:40Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934771#M18117</link>
      <description>&lt;P&gt;Hi Ballarddw,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I appreciate your well-thought-out suggestion to my problem.&lt;/P&gt;&lt;P&gt;Yes, the SAS-VA boxplot is quite useful as a supplement to ordinary histograms.&lt;/P&gt;&lt;P&gt;Hence, I have used your suggestion (see below) together with histograms inside a stacking container, where each stacked histogram has a number of bins of varying size. This allows users to choose the number of bins that gives a smooth-looking histogram.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The very best regards&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="histsas.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/98167iD2738A29ADC0BDA0/image-size/large?v=v2&amp;amp;px=999" role="button" title="histsas.png" alt="histsas.png" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 11:41:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934771#M18117</guid>
      <dc:creator>Percentile95</dc:creator>
      <dc:date>2024-07-05T11:41:55Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934790#M18118</link>
      <description>&lt;P&gt;I didn't look at the data, but curious as to the underlying cause of these weird looking spiky histograms.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is it that your values are not continuous, they are rounded in some way that creates spikes for certain bin sizes / bin locations?&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 13:34:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934790#M18118</guid>
      <dc:creator>Quentin</dc:creator>
      <dc:date>2024-07-05T13:34:22Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934813#M18119</link>
      <description>&lt;P&gt;This can happen with data that are rounded. For a discussion, example, and solution, see&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://blogs.sas.com/content/iml/2015/06/03/density-curve-too-short.html" target="_blank"&gt;The mystery of the density curve that was too short - The DO Loop (sas.com)&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;To make this visual illusion disappear, use a bin width that is at least as large as the rounding unit in the data.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 15:35:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934813#M18119</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2024-07-05T15:35:49Z</dc:date>
    </item>
    <item>
      <title>Re: Guidance on avoiding histograms that display spikes and other artifacts.</title>
      <link>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934821#M18120</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13684"&gt;@Rick_SAS&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;SPAN&gt;To make this visual illusion disappear, use a bin width that is at least as large as the rounding unit in the data.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;If the bin width is larger than the rounding unit, it should be an &lt;EM&gt;integer multiple&lt;/EM&gt; of the rounding unit. Otherwise, you can still get those spikes in the histogram, as was discussed in the 2021 thread&amp;nbsp;&lt;A href="https://communities.sas.com/t5/SAS-Studio/Histogram-does-not-reflect-summary-statistics/m-p/746694/highlight/true#M10010" target="_blank" rel="noopener"&gt;Histogram does not reflect summary statistics&lt;/A&gt;, where due to the non-integer ratio 1.2 : 1 every fifth histogram bar comprised two values rather than one.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jul 2024 17:26:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Visual-Analytics/Guidance-on-avoiding-histograms-that-display-spikes-and-other/m-p/934821#M18120</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2024-07-05T17:26:13Z</dc:date>
    </item>
  </channel>
</rss>

