Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Basic Question about quantiles and proc means

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-22-2020 04:55 AM
(861 views)

Hello,

Here is a data set that I created :

data Test;

input CONT ;

datalines;

3

8

5

3

1

6

9

0

2

4

6

5

;

run;

I wanted to test the quantile option of the proc means on this data set.

I was expecting Q1= 3 and Q3=6 but i obtained Q1=2.5 and Q3=6.

The sorted series is 0 1 2 3 3 4 5 5 6 6 8 9. I undestand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so.

For me, i have 4 groups of equal size :

- 1st group : 0 1 2

- 2nd group : 3 3 4 (where "3" is the minimum, so the number i was expecting)

- 3rd : 5 5 6 (where 6 is the maximum, hence the "6" i was expecting, although i think it's (6+6)/2

- 4th : 6 8 9

So, why do i get 2.5, and most importantly, how can i get the 3, which represents the minimum of the interval conaining 50% of the data in the middle of my data set.

Well, I hope i was clear 😕

Thank you 🙂

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @Mathis1,

PROC MEANS (and other procedures, e.g., PROC UNIVARIATE) offer *five* different definitions of quantiles: please see Quantile and Related Statistics or Rick Wicklin's blog post Quantile definitions in SAS. Normally, you choose one of these five and specify it with the appropriate option (here: the QNTLDEF= option of the PROC MEANS statement) if it's not the default (which is definition no. 5). However, none of these five defintions would result in Q1=3 for your example data. So, if you think it's worth the effort, you would need to compute the quantiles manually (perhaps in a user-defined function or macro). Examples are presented in another blog post by Rick Wicklin: Sample quantiles: A comparison of 9 definitions, where four additional definitions are implemented using SAS/IML.

The basis of such an implementation would be a general mathematical definition (cf. the existing definitions), not an example with data.

2 REPLIES 2

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hello @Mathis1,

PROC MEANS (and other procedures, e.g., PROC UNIVARIATE) offer *five* different definitions of quantiles: please see Quantile and Related Statistics or Rick Wicklin's blog post Quantile definitions in SAS. Normally, you choose one of these five and specify it with the appropriate option (here: the QNTLDEF= option of the PROC MEANS statement) if it's not the default (which is definition no. 5). However, none of these five defintions would result in Q1=3 for your example data. So, if you think it's worth the effort, you would need to compute the quantiles manually (perhaps in a user-defined function or macro). Examples are presented in another blog post by Rick Wicklin: Sample quantiles: A comparison of 9 definitions, where four additional definitions are implemented using SAS/IML.

The basis of such an implementation would be a general mathematical definition (cf. the existing definitions), not an example with data.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

> I understand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so.

It is so because the statistic is an estimate of the quantile of the population. If you were to choose 2 or 3 instead of 2.5, you would be using a biased estimator of the quantile.

> how can i get the 3, which represents the minimum of the interval containing 50% of the data in the middle of my data set.

Well, there are ways to get it, but I wouldn't advise it. In your example, you are putting one 6 into the 3rd quartile and another 6 into the 4th quartile. Tied values should not be split between groups. For unique values, you can use PROC RANK to split the data into quartiles, but that method breaks down when there are duplicate values.

If you insist on pursuing this, sort the data. For N nonmissing observations, you want to use the ceil(N/4)th and floor(3*N/4)th values. But be aware that those are not the sample 0.25 and 0.75 quantiles.

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.