BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Mathis1
Quartz | Level 8

Hello,

Here is a data set that I created : 

 

data Test;
input CONT ;
datalines;
3
8
5
3
1
6
9
0
2
4
6
5
;
run;

 

I wanted to test the quantile option of the proc means on this data set. 

I was expecting Q1= 3 and Q3=6 but i obtained Q1=2.5 and Q3=6.

 

The sorted series is 0 1 2 3 3 4 5 5 6 6 8 9. I undestand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so. 

For me, i have 4 groups of equal size :

- 1st group : 0 1 2

- 2nd group : 3 3 4 (where "3" is the minimum, so  the number i was expecting)

- 3rd : 5 5 6 (where 6 is the maximum, hence the "6" i was expecting, although i think it's (6+6)/2

- 4th : 6 8 9 

 

So, why do i get 2.5, and most importantly, how can i get the 3, which represents the minimum of the interval conaining 50% of the data in the middle of my data set.

 

Well, I hope i was clear 😕 

 

Thank you 🙂

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @Mathis1,

 

PROC MEANS (and other procedures, e.g., PROC UNIVARIATE) offer five different definitions of quantiles: please see Quantile and Related Statistics or Rick Wicklin's blog post Quantile definitions in SAS. Normally, you choose one of these five and specify it with the appropriate option (here: the QNTLDEF= option of the PROC MEANS statement) if it's not the default (which is definition no. 5). However, none of these five defintions would result in Q1=3 for your example data. So, if you think it's worth the effort, you would need to compute the quantiles manually (perhaps in a user-defined function or macro). Examples are presented in another blog post by Rick Wicklin: Sample quantiles: A comparison of 9 definitions, where four additional definitions are implemented using SAS/IML.

 

The basis of such an implementation would be a general mathematical definition (cf. the existing definitions), not an example with data.

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @Mathis1,

 

PROC MEANS (and other procedures, e.g., PROC UNIVARIATE) offer five different definitions of quantiles: please see Quantile and Related Statistics or Rick Wicklin's blog post Quantile definitions in SAS. Normally, you choose one of these five and specify it with the appropriate option (here: the QNTLDEF= option of the PROC MEANS statement) if it's not the default (which is definition no. 5). However, none of these five defintions would result in Q1=3 for your example data. So, if you think it's worth the effort, you would need to compute the quantiles manually (perhaps in a user-defined function or macro). Examples are presented in another blog post by Rick Wicklin: Sample quantiles: A comparison of 9 definitions, where four additional definitions are implemented using SAS/IML.

 

The basis of such an implementation would be a general mathematical definition (cf. the existing definitions), not an example with data.

Rick_SAS
SAS Super FREQ

> I understand that the proc means returned 2.5 because it's (2+3/2) but i don't understand why it is so. 

 

It is so because the statistic is an estimate of the quantile of the population. If you were to choose 2 or 3 instead of 2.5, you would be using a biased estimator of the quantile.

 

> how can i get the 3, which represents the minimum of the interval containing 50% of the data in the middle of my data set.

 

Well, there are ways to get it, but I wouldn't advise it. In your example, you are putting one 6 into the 3rd quartile and another 6 into the 4th quartile. Tied values should not be split between groups.  For unique values, you can use PROC RANK to split the data into quartiles, but that method breaks down when there are duplicate values

 

If you insist on pursuing this, sort the data. For N nonmissing observations, you want to use the ceil(N/4)th  and floor(3*N/4)th values.  But be aware that those are not the sample 0.25 and 0.75 quantiles.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 703 views
  • 6 likes
  • 3 in conversation