BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

with the following code I calculate the first quartile of a data set. in the example with sas the first quartile is 3, but in excel it is calculated at 3.5, why?

 

 

data t1;
input value;
datalines;
1
2
4
7
8
9
10
12
;
PROC MEANS data=t1 q1 ;
VAR value;
RUN;

In excel the result of the following formula is 3.5 instead:

 

=QUARTILE(A2:A9;1)

where A2: A9 are in excel the same data of the example in sas

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

As far as I understand Rick's IML code, the calculation can be done in a DATA step as follows:

data quartiles(keep=_q1-_q3);
array _v[9999] _temporary_; /* use dimension >= number of non-missing obs. in dataset T1 */
do _n=1 by 1 until(last);
  set t1(where=(n(value))) end=last;
  _v[_n]=value;
end;
array _q[3];
do _k=1 to 3;
  _p=_k/4;
  _j=floor(_n*_p + 1-_p);
  _g=_n*_p + 1-_p - _j;
  _q[_k]=(1-_g)*smallest(_j, of _v[*]) + _g*smallest(_j+1, of _v[*]);
end;
run;

This code calculates the first, second and third quartile (_q1, _q2, _q3) of the values in dataset T1 (variable VALUE) using "Definition 7," provided that <=9999 values are involved (otherwise increase the array dimension).

 

At least for your sample data the result matches the values computed by Excel 2013:

_q1    _q2     _q3

3.5    7.5    9.25

View solution in original post

5 REPLIES 5
Reeza
Super User

There isn't a single standard definition of quartiles and Excel actually uses an approximation method. Proc means has 5 different definitions you can use and specify on the PROC MEANS statement.

I personally, would not consider Excel statistics to be accurate in comparison to a statistical tool, like SAS, R or Python.

The manual, naive calculation for your example generates a 3.

2 + 4 / 2 = 3.

 


@mariopellegrini wrote:

with the following code I calculate the first quartile of a data set. in the example with sas the first quartile is 3, but in excel it is calculated at 3.5, why?

 

 

data t1;
input value;
datalines;
1
2
4
7
8
9
10
12
;
PROC MEANS data=t1 q1 ;
VAR value;
RUN;

In excel the result of the following formula is 3.5 instead:

 

=QUARTILE(A2:A9;1)

where A2: A9 are in excel the same data of the example in sas


 

ballardw
Super User

You may want to read the Proc Means documentation calculation of quantiles.

 

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=proc&docsetTarget=n1w9gy2...

 

Basically there are different definitions possible for quantile, mostly involving differences in how the value for a "not actually encounter quantile" is treated. So Excel and the defaults of SAS are using different methods to calculate the value between 2 and 4 in your data.

Some approaches could result in 2 or 4 as values that actually occur in your data.

 

 

FreelanceReinh
Jade | Level 19

Hello @mariopellegrini,

 

The article "Sample quantiles: A comparison of 9 definitions" in Rick Wicklin's blog reviews not only the five sample quantile definitions implemented in SAS (see option QNTLDEF= in the PROC MEANS documentation), but also four additional definitions described in a 1996 journal article. Your example and a few others that I've quickly checked suggest that Excel's QUARTILE function uses one of those additional definitions: "definition 7" in the journal, "type=7" in Rick's blog and SAS/IML code, which you could use to compute the quartiles based on that definition. (I couldn't test it because SAS/IML is not included in my SAS license).

mariopellegrini
Pyrite | Level 9

Thanks FreelanceReinhard, but I don't understand how to do the calculation for example for the sample of my example using the "type = 7"

FreelanceReinh
Jade | Level 19

As far as I understand Rick's IML code, the calculation can be done in a DATA step as follows:

data quartiles(keep=_q1-_q3);
array _v[9999] _temporary_; /* use dimension >= number of non-missing obs. in dataset T1 */
do _n=1 by 1 until(last);
  set t1(where=(n(value))) end=last;
  _v[_n]=value;
end;
array _q[3];
do _k=1 to 3;
  _p=_k/4;
  _j=floor(_n*_p + 1-_p);
  _g=_n*_p + 1-_p - _j;
  _q[_k]=(1-_g)*smallest(_j, of _v[*]) + _g*smallest(_j+1, of _v[*]);
end;
run;

This code calculates the first, second and third quartile (_q1, _q2, _q3) of the values in dataset T1 (variable VALUE) using "Definition 7," provided that <=9999 values are involved (otherwise increase the array dimension).

 

At least for your sample data the result matches the values computed by Excel 2013:

_q1    _q2     _q3

3.5    7.5    9.25

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 2891 views
  • 1 like
  • 4 in conversation