BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Calcite | Level 5

## How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not matching

Dear Team,

I am new to statistics field. In my current assignment, I need to replicate the manual excel work with SAS excel output. I am facing below challenge, on getting the excel percentile value in SAS output.

My dataset is having a column named Sample with below 4 values. I am trying to get the 25th and 75th percentile value for the Sample column. Presently, team is deriving the 25th percentile using Excel function PERCENTILE(A2:A5,.25), and PERCENTILE(A2:A5,.75).

Tried simple PROC MEANS and PROC UNIVERATE in SAS. Please find the code and outputs below,

Code and Outputs:

PROC PRINT DATA=DEMO;
TITLE 'Output of Demo Dataset';
RUN;

PROC MEANS DATA=WORK.DEMO P25 P75 NOPRINT;
VAR SAMPLE;
OUTPUT OUT = C_MEANS
P25    = C_P25
P75    = C_P75;
RUN;

PROC UNIVARIATE DATA=WORK.DEMO NOPRINT;
VAR SAMPLE;
OUTPUT OUT =C_UNIV
PCTLPTS = 25, 75
PCTLPRE = P_;
RUN;

Sincere Thanks in advance.

Karthik Mani.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Here you can get all 9 definition percentiles by Rick's code.

``````/*
There 9 different definitions  percentile.
Check Rick's blog:
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html

And Rick used IML code to calculate all 9 definitions  percentile.
P.S. SAS  only compute 1-5 definitions  percentile.

The following code is written by Rick.You can see EXCEL used the 7th percentile.
*/

data Q;
input x ;
datalines;
0.1
0.48
0.98
1.68
;
proc iml;
/* implement sample quantiles in Hyndman and Fan (1996) */
start SampQuantile(y, probs={0.25, 0.5, 0.75}, Type=2);
x = colvec(y);
p = colvec(probs);
if element(Type, {1 2 3 4 6}) then do;
SASType = {3 5 2 1 . 4};  /* convert from Type to QNTLDEF */
call qntl(q, x, p, SASType[Type]);
return q;
end;
if any(x=.) then
x = x[loc(x ^= .)]; /* remove missing values */
call sort(x);
N = nrow(x);
if Type=5 then
return quantileM(x, p, 0.5, 1/(2*N));
else if Type=7 then
return quantileM(x, p, 1-p, 0);
else if Type=8 then
return quantileM(x, p, (p+1)/3, 2/(3*N+1));
else if Type=9 then
return quantileM(x, p, (2*p+3)/8, 5/(8*N+2));
finish;

start quantileM(x, p, m, c);
N = nrow(x);       /* assume nonmissing */
j = floor(N*p + m);
g = n*p + m - j;
if all(p>=c & p<1-c) then     /* usual case */
return (1-g)#x[j] + g#x[j+1];

/* otherwise, handle extreme values of p */
/* When p < c, use x[1]. When p = 1 - c, use x[N] */
q = j(nrow(p),1,.);
idx1 = loc(p <  c);           /* set these quantiles to x[1] */
idxN = loc(p >= 1-c);         /* set these quantiles to x[N] */
idx  = loc(p > c & p < 1-c);
if ncol(idx1)>0 then   q[idx1] = x[1];
if ncol(idxN)>0 then   q[idxN] = x[N];
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
store module=(SampQuantile quantileM);
quit;

/* Driver SAS/IML program to compare all 9 methods  */
proc iml;
load module=(SampQuantile);
use Q; read all var "x"; close;
prob = T(1:999)/1000;                 * fine mesh of prob values;
create Pctls var {"Type" "quantile" "prob" "x"};
do t = 1 to 9;
quantile = SampQuantile(x, prob, t);
Type = j(nrow(prob), 1, t);
append;
end;
close;
quit;
proc print data=Pctls noobs;
where prob = 0.25;                 /* compare 0.25 quantile for different definitions */
run;
``````

8 REPLIES 8
Super User

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

If you read the documentation for Proc Means you will see that there are actual several different settings that related to Percentile calculations. Look at the option QNTLDEF on this page: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/n1qnc9bddfvhzqn105kqitnf29cp.htm

for Methods Computing Quantile Statistics. Some of these options involve how ties are broken as

Showing pictures of spreadsheets is misleading. One, it is not actual values but a picture so unless we retype everything we can't actually use anything shown. Second we don't see the actual formula , as in actual cells selected or any other options. Third is we do not know what precision of values may actually exist for use. Consider that Excel typically defaults to two decimal places when assigning the cell as numeric. A value that displays as 0.48 might actually be 0.481 or 0.479 (or quite a range actually). So the values shown are just approximations of just what they may be. I don't know, and really don't care, which default algorithm Excel uses but it is pretty obviously different for determining quantiles.

The typical "beginning" definition of a quantile is the order or between two values (and how those values) depending on which quantile is requested.  The "median" of 4 values typically wold fall between second and third values. So one set of rules will average those two. Another set might weight based on the values, another might take one of the end points.

The P25 of 4 values would fall beween first and second, so the default SAS uses is the average , 0.29.

SAS also has a similar issue based on assigned display format. So a format that only displays two decimal places can round values. So pictures of SAS output can also be misleading.

Provide example data in the form of data step code pasted into a text box opened with the </> icon above the message window to preserve text layout, the message windows will remove white space and other things depending on code, and to differentiate code from discussion.

```data have;
input x;
datalines;
0.98
1.68
0.48
0.10
;```

I tried running all 5 definitions of QNTLDEF (or PCTLDEF, same thing different name) in SAS. I do not get the result you show for Excel using any of them.

The closest I get is using this and it isn't too far from the Excel P75. But P25 is still right out.

```proc means data=have qntldef=4 maxdec=3 p25 p75;
var x;
run;```

Which yields:

Analysis Variable : x

25th Pctl 75th Pctl
0.195 1.505

If you were creating output sets from Means to use print for display because you didn't like the number of decimals involved you might look into the MAXDEC option in proc means. That sets the maximum number of decimals to display.

Calcite | Level 5

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Dear @ballardw,

Thanks for looking into my issue and for the possible solutions. Noted all your solutions, comments, inputs, and suggestions. However, my team is expecting the Percentile value from the SAS exactly same as Excel.

Thanks and Regards,

Karthik Mani.

Diamond | Level 26

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

@karthikmanimca wrote:

Thanks for looking into my issue and for the possible solutions. Noted all your solutions, comments, inputs, and suggestions. However, my team is expecting the Percentile value from the SAS exactly same as Excel.

Did you try the five different PROC UNIVARIATE options to see if they match what Excel produces?

Unfortunately, sometimes expecting Excel and SAS to match identically is going to lead to disappointment.

--
Paige Miller
Calcite | Level 5

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Dear @PaigeMiller,

Thanks for the response. I am very new to Statistics and still learning. Will explore the PROC UNIVERATE procedure and its option.

Regards,

Karthik Mani.

Super User

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

``````/************

https://blogs.sas.com/content/iml/2021/07/26/compare-quantiles-sas-r-python.html
************/

proc iml;
/* By default, R (and Julia, and some Python packages) uses
Hyndman and Fan's Type=7 definition. Compute Type=7 sample quantiles.
*/
start GetQuantile7(y, probs);
x = colvec(y);
N = nrow(x);
if N=1 then return (y);  /* handle the degenerate case, N=1 */

/* remove missing values, if any */
idx = loc(x^=.);
if ncol(idx)=0 then
return (.);           /* all values are missing */
else if ncol(idx)<N then do;
x = x[idx,]; N = nrow(x);  /* remove missing */
end;

/* Main computation: Compute Type=7 sample quantile.
Estimate is a linear interpolation between x[j] and x[j+1]. */
call sort(x);
p = colvec(probs);
m = 1-p;
j = floor(N*p + m);      /* indices into sorted data values */
g = N*p + m - j;         /* 0 <= g <= 1 for interpolation */

q = j(nrow(p), 1, x[N]); /* if p=1, estimate by x[N]=max(x) */
idx = loc(p < 1);
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1]; /* linear interpolation */
end;
return q;
finish;

/* Compare the SAS and R default definitions.
The differences between definitions are most apparent
for small samples that have large gaps between adjacent data values. */
x = {0.1 0.48 0.98 1.68 }`;
prob = {0.25, 0.75};

call qntl(SASDefaultQntl, x, prob);
RDefaultQntl = GetQuantile7(x, prob);
print prob SASDefaultQntl RDefaultQntl;
quit;``````

Calcite | Level 5

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Dear @Ksharp,

Thank you so much for taking your time and wonderful code. I feel. we have the solution and will work for my scenario. Will keep you and every updated soon.

Regards,

Karthik Mani.

Super User

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Here you can get all 9 definition percentiles by Rick's code.

``````/*
There 9 different definitions  percentile.
Check Rick's blog:
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html

And Rick used IML code to calculate all 9 definitions  percentile.
P.S. SAS  only compute 1-5 definitions  percentile.

The following code is written by Rick.You can see EXCEL used the 7th percentile.
*/

data Q;
input x ;
datalines;
0.1
0.48
0.98
1.68
;
proc iml;
/* implement sample quantiles in Hyndman and Fan (1996) */
start SampQuantile(y, probs={0.25, 0.5, 0.75}, Type=2);
x = colvec(y);
p = colvec(probs);
if element(Type, {1 2 3 4 6}) then do;
SASType = {3 5 2 1 . 4};  /* convert from Type to QNTLDEF */
call qntl(q, x, p, SASType[Type]);
return q;
end;
if any(x=.) then
x = x[loc(x ^= .)]; /* remove missing values */
call sort(x);
N = nrow(x);
if Type=5 then
return quantileM(x, p, 0.5, 1/(2*N));
else if Type=7 then
return quantileM(x, p, 1-p, 0);
else if Type=8 then
return quantileM(x, p, (p+1)/3, 2/(3*N+1));
else if Type=9 then
return quantileM(x, p, (2*p+3)/8, 5/(8*N+2));
finish;

start quantileM(x, p, m, c);
N = nrow(x);       /* assume nonmissing */
j = floor(N*p + m);
g = n*p + m - j;
if all(p>=c & p<1-c) then     /* usual case */
return (1-g)#x[j] + g#x[j+1];

/* otherwise, handle extreme values of p */
/* When p < c, use x[1]. When p = 1 - c, use x[N] */
q = j(nrow(p),1,.);
idx1 = loc(p <  c);           /* set these quantiles to x[1] */
idxN = loc(p >= 1-c);         /* set these quantiles to x[N] */
idx  = loc(p > c & p < 1-c);
if ncol(idx1)>0 then   q[idx1] = x[1];
if ncol(idxN)>0 then   q[idxN] = x[N];
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
store module=(SampQuantile quantileM);
quit;

/* Driver SAS/IML program to compare all 9 methods  */
proc iml;
load module=(SampQuantile);
use Q; read all var "x"; close;
prob = T(1:999)/1000;                 * fine mesh of prob values;
create Pctls var {"Type" "quantile" "prob" "x"};
do t = 1 to 9;
quantile = SampQuantile(x, prob, t);
Type = j(nrow(prob), 1, t);
append;
end;
close;
quit;
proc print data=Pctls noobs;
where prob = 0.25;                 /* compare 0.25 quantile for different definitions */
run;
``````

Calcite | Level 5

## Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Dear Ksharp,

Thanks for the solution and it worked for me. Kindly keep up the great work and support for the SAS community. Looking forward to learn more from your expertise.

Yours sincerely,

Karthik Mani.

Discussion stats
• 8 replies
• 1299 views
• 11 likes
• 4 in conversation