Hello,
Here is a data set that I created :
data Test;
input CONT ;
datalines;
3
8
5
3
1
6
9
0
2
4
6
5
;
run;
I wanted to test the quantile option of the proc means on this data set.
I was expecting Q1= 3 and Q3=6 but i obtained Q1=2.5 and Q3=6.
The sorted series is 0 1 2 3 3 4 5 5 6 6 8 9. I undestand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so.
For me, i have 4 groups of equal size :
- 1st group : 0 1 2
- 2nd group : 3 3 4 (where "3" is the minimum, so the number i was expecting)
- 3rd : 5 5 6 (where 6 is the maximum, hence the "6" i was expecting, although i think it's (6+6)/2
- 4th : 6 8 9
So, why do i get 2.5, and most importantly, how can i get the 3, which represents the minimum of the interval conaining 50% of the data in the middle of my data set.
Well, I hope i was clear 😕
Thank you 🙂
Hello @Mathis1,
PROC MEANS (and other procedures, e.g., PROC UNIVARIATE) offer five different definitions of quantiles: please see Quantile and Related Statistics or Rick Wicklin's blog post Quantile definitions in SAS. Normally, you choose one of these five and specify it with the appropriate option (here: the QNTLDEF= option of the PROC MEANS statement) if it's not the default (which is definition no. 5). However, none of these five defintions would result in Q1=3 for your example data. So, if you think it's worth the effort, you would need to compute the quantiles manually (perhaps in a user-defined function or macro). Examples are presented in another blog post by Rick Wicklin: Sample quantiles: A comparison of 9 definitions, where four additional definitions are implemented using SAS/IML.
The basis of such an implementation would be a general mathematical definition (cf. the existing definitions), not an example with data.
Hello @Mathis1,
PROC MEANS (and other procedures, e.g., PROC UNIVARIATE) offer five different definitions of quantiles: please see Quantile and Related Statistics or Rick Wicklin's blog post Quantile definitions in SAS. Normally, you choose one of these five and specify it with the appropriate option (here: the QNTLDEF= option of the PROC MEANS statement) if it's not the default (which is definition no. 5). However, none of these five defintions would result in Q1=3 for your example data. So, if you think it's worth the effort, you would need to compute the quantiles manually (perhaps in a user-defined function or macro). Examples are presented in another blog post by Rick Wicklin: Sample quantiles: A comparison of 9 definitions, where four additional definitions are implemented using SAS/IML.
The basis of such an implementation would be a general mathematical definition (cf. the existing definitions), not an example with data.
> I understand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so.
It is so because the statistic is an estimate of the quantile of the population. If you were to choose 2 or 3 instead of 2.5, you would be using a biased estimator of the quantile.
> how can i get the 3, which represents the minimum of the interval containing 50% of the data in the middle of my data set.
Well, there are ways to get it, but I wouldn't advise it. In your example, you are putting one 6 into the 3rd quartile and another 6 into the 4th quartile. Tied values should not be split between groups. For unique values, you can use PROC RANK to split the data into quartiles, but that method breaks down when there are duplicate values.
If you insist on pursuing this, sort the data. For N nonmissing observations, you want to use the ceil(N/4)th and floor(3*N/4)th values. But be aware that those are not the sample 0.25 and 0.75 quantiles.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.