BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

with the following code I calculate the first quartile of a data set. in the example with sas the first quartile is 3, but in excel it is calculated at 3.5, why?

 

 

data t1;
input value;
datalines;
1
2
4
7
8
9
10
12
;
PROC MEANS data=t1 q1 ;
VAR value;
RUN;

In excel the result of the following formula is 3.5 instead:

 

=QUARTILE(A2:A9;1)

where A2: A9 are in excel the same data of the example in sas

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

As far as I understand Rick's IML code, the calculation can be done in a DATA step as follows:

data quartiles(keep=_q1-_q3);
array _v[9999] _temporary_; /* use dimension >= number of non-missing obs. in dataset T1 */
do _n=1 by 1 until(last);
  set t1(where=(n(value))) end=last;
  _v[_n]=value;
end;
array _q[3];
do _k=1 to 3;
  _p=_k/4;
  _j=floor(_n*_p + 1-_p);
  _g=_n*_p + 1-_p - _j;
  _q[_k]=(1-_g)*smallest(_j, of _v[*]) + _g*smallest(_j+1, of _v[*]);
end;
run;

This code calculates the first, second and third quartile (_q1, _q2, _q3) of the values in dataset T1 (variable VALUE) using "Definition 7," provided that <=9999 values are involved (otherwise increase the array dimension).

 

At least for your sample data the result matches the values computed by Excel 2013:

_q1    _q2     _q3

3.5    7.5    9.25

View solution in original post

5 REPLIES 5
Reeza
Super User

There isn't a single standard definition of quartiles and Excel actually uses an approximation method. Proc means has 5 different definitions you can use and specify on the PROC MEANS statement.

I personally, would not consider Excel statistics to be accurate in comparison to a statistical tool, like SAS, R or Python.

The manual, naive calculation for your example generates a 3.

2 + 4 / 2 = 3.

 


@mariopellegrini wrote:

with the following code I calculate the first quartile of a data set. in the example with sas the first quartile is 3, but in excel it is calculated at 3.5, why?

 

 

data t1;
input value;
datalines;
1
2
4
7
8
9
10
12
;
PROC MEANS data=t1 q1 ;
VAR value;
RUN;

In excel the result of the following formula is 3.5 instead:

 

=QUARTILE(A2:A9;1)

where A2: A9 are in excel the same data of the example in sas


 

ballardw
Super User

You may want to read the Proc Means documentation calculation of quantiles.

 

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=proc&docsetTarget=n1w9gy2...

 

Basically there are different definitions possible for quantile, mostly involving differences in how the value for a "not actually encounter quantile" is treated. So Excel and the defaults of SAS are using different methods to calculate the value between 2 and 4 in your data.

Some approaches could result in 2 or 4 as values that actually occur in your data.

 

 

FreelanceReinh
Jade | Level 19

Hello @mariopellegrini,

 

The article "Sample quantiles: A comparison of 9 definitions" in Rick Wicklin's blog reviews not only the five sample quantile definitions implemented in SAS (see option QNTLDEF= in the PROC MEANS documentation), but also four additional definitions described in a 1996 journal article. Your example and a few others that I've quickly checked suggest that Excel's QUARTILE function uses one of those additional definitions: "definition 7" in the journal, "type=7" in Rick's blog and SAS/IML code, which you could use to compute the quartiles based on that definition. (I couldn't test it because SAS/IML is not included in my SAS license).

mariopellegrini
Pyrite | Level 9

Thanks FreelanceReinhard, but I don't understand how to do the calculation for example for the sample of my example using the "type = 7"

FreelanceReinh
Jade | Level 19

As far as I understand Rick's IML code, the calculation can be done in a DATA step as follows:

data quartiles(keep=_q1-_q3);
array _v[9999] _temporary_; /* use dimension >= number of non-missing obs. in dataset T1 */
do _n=1 by 1 until(last);
  set t1(where=(n(value))) end=last;
  _v[_n]=value;
end;
array _q[3];
do _k=1 to 3;
  _p=_k/4;
  _j=floor(_n*_p + 1-_p);
  _g=_n*_p + 1-_p - _j;
  _q[_k]=(1-_g)*smallest(_j, of _v[*]) + _g*smallest(_j+1, of _v[*]);
end;
run;

This code calculates the first, second and third quartile (_q1, _q2, _q3) of the values in dataset T1 (variable VALUE) using "Definition 7," provided that <=9999 values are involved (otherwise increase the array dimension).

 

At least for your sample data the result matches the values computed by Excel 2013:

_q1    _q2     _q3

3.5    7.5    9.25

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3406 views
  • 1 like
  • 4 in conversation