<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Basic Question about quantiles and proc means in Statistical Procedures</title>
    <link>https://communities.sas.com/t5/Statistical-Procedures/Basic-Question-about-quantiles-and-proc-means/m-p/641854#M30693</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Here is a data set that I created :&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data Test;&lt;BR /&gt;input CONT ;&lt;BR /&gt;datalines;&lt;BR /&gt;3&lt;BR /&gt;8&lt;BR /&gt;5&lt;BR /&gt;3&lt;BR /&gt;1&lt;BR /&gt;6&lt;BR /&gt;9&lt;BR /&gt;0&lt;BR /&gt;2&lt;BR /&gt;4&lt;BR /&gt;6&lt;BR /&gt;5&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wanted to test the quantile option of the proc means on this data set.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was expecting Q1= 3 and Q3=6 but i obtained Q1=2.5 and Q3=6.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The sorted series is 0 1 2 3 3 4 5 5 6 6 8 9. I undestand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so.&amp;nbsp;&lt;/P&gt;&lt;P&gt;For me, i have 4 groups of equal size :&lt;/P&gt;&lt;P&gt;- 1st group : 0 1 2&lt;/P&gt;&lt;P&gt;- 2nd group : 3 3 4 (where "3" is the minimum, so&amp;nbsp; the number i was expecting)&lt;/P&gt;&lt;P&gt;- 3rd : 5 5 6 (where 6 is the maximum, hence the "6" i was expecting, although i think it's (6+6)/2&lt;/P&gt;&lt;P&gt;- 4th : 6 8 9&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, why do i get 2.5, and most importantly, how can i get the 3, which represents the minimum of the interval conaining 50% of the data in the middle of my data set.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Well, I hope i was clear &lt;span class="lia-unicode-emoji" title=":confused_face:"&gt;😕&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 22 Apr 2020 08:55:58 GMT</pubDate>
    <dc:creator>Mathis1</dc:creator>
    <dc:date>2020-04-22T08:55:58Z</dc:date>
    <item>
      <title>Basic Question about quantiles and proc means</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Basic-Question-about-quantiles-and-proc-means/m-p/641854#M30693</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;Here is a data set that I created :&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;data Test;&lt;BR /&gt;input CONT ;&lt;BR /&gt;datalines;&lt;BR /&gt;3&lt;BR /&gt;8&lt;BR /&gt;5&lt;BR /&gt;3&lt;BR /&gt;1&lt;BR /&gt;6&lt;BR /&gt;9&lt;BR /&gt;0&lt;BR /&gt;2&lt;BR /&gt;4&lt;BR /&gt;6&lt;BR /&gt;5&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I wanted to test the quantile option of the proc means on this data set.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was expecting Q1= 3 and Q3=6 but i obtained Q1=2.5 and Q3=6.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The sorted series is 0 1 2 3 3 4 5 5 6 6 8 9. I undestand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so.&amp;nbsp;&lt;/P&gt;&lt;P&gt;For me, i have 4 groups of equal size :&lt;/P&gt;&lt;P&gt;- 1st group : 0 1 2&lt;/P&gt;&lt;P&gt;- 2nd group : 3 3 4 (where "3" is the minimum, so&amp;nbsp; the number i was expecting)&lt;/P&gt;&lt;P&gt;- 3rd : 5 5 6 (where 6 is the maximum, hence the "6" i was expecting, although i think it's (6+6)/2&lt;/P&gt;&lt;P&gt;- 4th : 6 8 9&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, why do i get 2.5, and most importantly, how can i get the 3, which represents the minimum of the interval conaining 50% of the data in the middle of my data set.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Well, I hope i was clear &lt;span class="lia-unicode-emoji" title=":confused_face:"&gt;😕&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2020 08:55:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Basic-Question-about-quantiles-and-proc-means/m-p/641854#M30693</guid>
      <dc:creator>Mathis1</dc:creator>
      <dc:date>2020-04-22T08:55:58Z</dc:date>
    </item>
    <item>
      <title>Re: Basic Question about quantiles and proc means</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Basic-Question-about-quantiles-and-proc-means/m-p/641869#M30695</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/323613"&gt;@Mathis1&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;PROC MEANS (and other procedures, e.g., PROC UNIVARIATE) offer &lt;EM&gt;five&lt;/EM&gt; different definitions of quantiles: please see&amp;nbsp;&lt;A href="https://documentation.sas.com/?docsetId=proc&amp;amp;docsetTarget=p0v0y1on1hbxukn0zqgsp5ky8hc0.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#n096sxke940tubn1mwmydr89jxdk" target="_blank" rel="noopener"&gt;Quantile and Related Statistics&lt;/A&gt;&amp;nbsp;or Rick Wicklin's blog post&amp;nbsp;&lt;A href="https://blogs.sas.com/content/iml/2017/05/22/quantile-definitions-sas.html" target="_blank" rel="noopener"&gt;Quantile definitions in SAS&lt;/A&gt;.&amp;nbsp;Normally, you choose one of these five and specify it with the appropriate option (here: the &lt;A href="https://documentation.sas.com/?docsetId=proc&amp;amp;docsetTarget=n1qnc9bddfvhzqn105kqitnf29cp.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en#n1xsk6v7ixfsrmn1phcsrgps47mi" target="_blank" rel="noopener"&gt;QNTLDEF= option&lt;/A&gt; of the PROC MEANS statement) if it's not the default (which is definition no. 5). However, none of these five defintions would result in Q1=3 for your example data. So, if you think it's worth the effort, you would need to compute the quantiles manually (perhaps in a user-defined function or macro). Examples are presented in another blog post by Rick Wicklin:&amp;nbsp;&lt;A href="https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html" target="_blank" rel="noopener"&gt;Sample quantiles: A comparison of 9 definitions&lt;/A&gt;, where four additional definitions are implemented using SAS/IML.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The basis of such an implementation would be a general mathematical definition (cf. the existing definitions), not an example with data.&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2020 10:27:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Basic-Question-about-quantiles-and-proc-means/m-p/641869#M30695</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2020-04-22T10:27:51Z</dc:date>
    </item>
    <item>
      <title>Re: Basic Question about quantiles and proc means</title>
      <link>https://communities.sas.com/t5/Statistical-Procedures/Basic-Question-about-quantiles-and-proc-means/m-p/641984#M30702</link>
      <description>&lt;P&gt;&lt;SPAN&gt;&amp;gt; I understand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;It is so because the statistic is an estimate of the quantile of the population. If you were to choose 2 or 3 instead of 2.5, you would be using a biased estimator of the quantile.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;gt;&amp;nbsp;how can i get the 3, which represents the minimum of the interval containing 50% of the data in the middle of my data set.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Well, there are ways to get it, but I wouldn't advise it. In your example, you are putting one 6 into the 3rd quartile and another 6 into the 4th quartile. Tied values should not be split between groups.&amp;nbsp; For unique values, you can &lt;A href="https://blogs.sas.com/content/iml/2012/09/24/grouping-observations-based-on-quantiles.html" target="_self"&gt;use PROC RANK to split the data into quartiles&lt;/A&gt;, but &lt;A href="https://blogs.sas.com/content/iml/2014/11/05/binning-quantiles-rounded-data.html" target="_self"&gt;that method breaks down when there are duplicate values&lt;/A&gt;.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If you insist on pursuing this, sort the data. For N nonmissing observations, you want to use the ceil(N/4)th&amp;nbsp; and floor(3*N/4)th values.&amp;nbsp; But be aware that those are not the sample 0.25 and 0.75 quantiles.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 22 Apr 2020 15:06:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/Statistical-Procedures/Basic-Question-about-quantiles-and-proc-means/m-p/641984#M30702</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2020-04-22T15:06:52Z</dc:date>
    </item>
  </channel>
</rss>

