SAS Programming

DATA Step, Macro, Functions and more
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

with the following code I calculate the first quartile of a data set. in the example with sas the first quartile is 3, but in excel it is calculated at 3.5, why?

 

 

data t1;
input value;
datalines;
1
2
4
7
8
9
10
12
;
PROC MEANS data=t1 q1 ;
VAR value;
RUN;

In excel the result of the following formula is 3.5 instead:

 

=QUARTILE(A2:A9;1)

where A2: A9 are in excel the same data of the example in sas

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

As far as I understand Rick's IML code, the calculation can be done in a DATA step as follows:

data quartiles(keep=_q1-_q3);
array _v[9999] _temporary_; /* use dimension >= number of non-missing obs. in dataset T1 */
do _n=1 by 1 until(last);
  set t1(where=(n(value))) end=last;
  _v[_n]=value;
end;
array _q[3];
do _k=1 to 3;
  _p=_k/4;
  _j=floor(_n*_p + 1-_p);
  _g=_n*_p + 1-_p - _j;
  _q[_k]=(1-_g)*smallest(_j, of _v[*]) + _g*smallest(_j+1, of _v[*]);
end;
run;

This code calculates the first, second and third quartile (_q1, _q2, _q3) of the values in dataset T1 (variable VALUE) using "Definition 7," provided that <=9999 values are involved (otherwise increase the array dimension).

 

At least for your sample data the result matches the values computed by Excel 2013:

_q1    _q2     _q3

3.5    7.5    9.25

View solution in original post

5 REPLIES 5
Reeza
Super User

There isn't a single standard definition of quartiles and Excel actually uses an approximation method. Proc means has 5 different definitions you can use and specify on the PROC MEANS statement.

I personally, would not consider Excel statistics to be accurate in comparison to a statistical tool, like SAS, R or Python.

The manual, naive calculation for your example generates a 3.

2 + 4 / 2 = 3.

 


@mariopellegrini wrote:

with the following code I calculate the first quartile of a data set. in the example with sas the first quartile is 3, but in excel it is calculated at 3.5, why?

 

 

data t1;
input value;
datalines;
1
2
4
7
8
9
10
12
;
PROC MEANS data=t1 q1 ;
VAR value;
RUN;

In excel the result of the following formula is 3.5 instead:

 

=QUARTILE(A2:A9;1)

where A2: A9 are in excel the same data of the example in sas


 

ballardw
Super User

You may want to read the Proc Means documentation calculation of quantiles.

 

https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=proc&docsetTarget=n1w9gy2...

 

Basically there are different definitions possible for quantile, mostly involving differences in how the value for a "not actually encounter quantile" is treated. So Excel and the defaults of SAS are using different methods to calculate the value between 2 and 4 in your data.

Some approaches could result in 2 or 4 as values that actually occur in your data.

 

 

FreelanceReinh
Jade | Level 19

Hello @mariopellegrini,

 

The article "Sample quantiles: A comparison of 9 definitions" in Rick Wicklin's blog reviews not only the five sample quantile definitions implemented in SAS (see option QNTLDEF= in the PROC MEANS documentation), but also four additional definitions described in a 1996 journal article. Your example and a few others that I've quickly checked suggest that Excel's QUARTILE function uses one of those additional definitions: "definition 7" in the journal, "type=7" in Rick's blog and SAS/IML code, which you could use to compute the quartiles based on that definition. (I couldn't test it because SAS/IML is not included in my SAS license).

mariopellegrini
Pyrite | Level 9

Thanks FreelanceReinhard, but I don't understand how to do the calculation for example for the sample of my example using the "type = 7"

FreelanceReinh
Jade | Level 19

As far as I understand Rick's IML code, the calculation can be done in a DATA step as follows:

data quartiles(keep=_q1-_q3);
array _v[9999] _temporary_; /* use dimension >= number of non-missing obs. in dataset T1 */
do _n=1 by 1 until(last);
  set t1(where=(n(value))) end=last;
  _v[_n]=value;
end;
array _q[3];
do _k=1 to 3;
  _p=_k/4;
  _j=floor(_n*_p + 1-_p);
  _g=_n*_p + 1-_p - _j;
  _q[_k]=(1-_g)*smallest(_j, of _v[*]) + _g*smallest(_j+1, of _v[*]);
end;
run;

This code calculates the first, second and third quartile (_q1, _q2, _q3) of the values in dataset T1 (variable VALUE) using "Definition 7," provided that <=9999 values are involved (otherwise increase the array dimension).

 

At least for your sample data the result matches the values computed by Excel 2013:

_q1    _q2     _q3

3.5    7.5    9.25

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 3987 views
  • 1 like
  • 4 in conversation