Solved: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & ...

karthikmanimca · Posted 06-16-2023 03:22 AM

Dear Team,

I am new to statistics field. In my current assignment, I need to replicate the manual excel work with SAS excel output. I am facing below challenge, on getting the excel percentile value in SAS output.

My dataset is having a column named Sample with below 4 values. I am trying to get the 25th and 75th percentile value for the Sample column. Presently, team is deriving the 25th percentile using Excel function PERCENTILE(A2:A5,.25), and PERCENTILE(A2:A5,.75).

Tried simple PROC MEANS and PROC UNIVERATE in SAS. Please find the code and outputs below,

Code and Outputs:

PROC PRINT DATA=DEMO;

TITLE 'Output of Demo Dataset';

RUN;

PROC MEANS DATA=WORK.DEMO P25 P75 NOPRINT;

VAR SAMPLE;

OUTPUT OUT = C_MEANS

P25 = C_P25

P75 = C_P75;

RUN;

PROC UNIVARIATE DATA=WORK.DEMO NOPRINT;

VAR SAMPLE;

OUTPUT OUT =C_UNIV

PCTLPTS = 25, 75

PCTLPRE = P_;

RUN;

Sincere Thanks in advance.

Karthik Mani.

Ksharp · Posted 06-16-2023 07:46 AM

Here you can get all 9 definition percentiles by Rick's code.

/*
There 9 different definitions  percentile.
Check Rick's blog:
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html

And Rick used IML code to calculate all 9 definitions  percentile.
P.S. SAS  only compute 1-5 definitions  percentile.

The following code is written by Rick.You can see EXCEL used the 7th percentile.
*/

data Q;
input x ;
datalines;
0.1
0.48
0.98
1.68
;
proc iml;
/* implement sample quantiles in Hyndman and Fan (1996) */
start SampQuantile(y, probs={0.25, 0.5, 0.75}, Type=2);
   x = colvec(y);
   p = colvec(probs);
   if element(Type, {1 2 3 4 6}) then do;
      SASType = {3 5 2 1 . 4};  /* convert from Type to QNTLDEF */
      call qntl(q, x, p, SASType[Type]);
      return q;
   end;
   if any(x=.) then 
      x = x[loc(x ^= .)]; /* remove missing values */
   call sort(x);
   N = nrow(x);
   if Type=5 then 
      return quantileM(x, p, 0.5, 1/(2*N));
   else if Type=7 then
      return quantileM(x, p, 1-p, 0);
   else if Type=8 then
      return quantileM(x, p, (p+1)/3, 2/(3*N+1));
   else if Type=9 then
      return quantileM(x, p, (2*p+3)/8, 5/(8*N+2));
finish;

start quantileM(x, p, m, c);
   N = nrow(x);       /* assume nonmissing */
   j = floor(N*p + m);
   g = n*p + m - j;
   if all(p>=c & p<1-c) then     /* usual case */
      return (1-g)#x[j] + g#x[j+1];

   /* otherwise, handle extreme values of p */
   /* When p < c, use x[1]. When p = 1 - c, use x[N] */
   q = j(nrow(p),1,.);
   idx1 = loc(p <  c);           /* set these quantiles to x[1] */
   idxN = loc(p >= 1-c);         /* set these quantiles to x[N] */
   idx  = loc(p > c & p < 1-c);
   if ncol(idx1)>0 then   q[idx1] = x[1];
   if ncol(idxN)>0 then   q[idxN] = x[N];
   if ncol(idx) >0 then do;
      j = j[idx]; g = g[idx];
      q[idx] = (1-g)#x[j] + g#x[j+1];
   end;
   return q;
finish;
store module=(SampQuantile quantileM);
quit;

/* Driver SAS/IML program to compare all 9 methods  */
proc iml;
load module=(SampQuantile);
use Q; read all var "x"; close;       
prob = T(1:999)/1000;                 * fine mesh of prob values;
create Pctls var {"Type" "quantile" "prob" "x"};
do t = 1 to 9;
   quantile = SampQuantile(x, prob, t);
   Type = j(nrow(prob), 1, t);
   append;
end;
close;
quit;
proc print data=Pctls noobs;
   where prob = 0.25;                 /* compare 0.25 quantile for different definitions */
run;

View solution in original post

ballardw · Posted 06-16-2023 05:15 AM

If you read the documentation for Proc Means you will see that there are actual several different settings that related to Percentile calculations. Look at the option QNTLDEF on this page: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/n1qnc9bddfvhzqn105kqitnf29cp.htm

and https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/p0v0y1on1hbxukn0zqgsp5ky8hc0.htm#n096sxk...

for Methods Computing Quantile Statistics. Some of these options involve how ties are broken as

Showing pictures of spreadsheets is misleading. One, it is not actual values but a picture so unless we retype everything we can't actually use anything shown. Second we don't see the actual formula , as in actual cells selected or any other options. Third is we do not know what precision of values may actually exist for use. Consider that Excel typically defaults to two decimal places when assigning the cell as numeric. A value that displays as 0.48 might actually be 0.481 or 0.479 (or quite a range actually). So the values shown are just approximations of just what they may be. I don't know, and really don't care, which default algorithm Excel uses but it is pretty obviously different for determining quantiles.

The typical "beginning" definition of a quantile is the order or between two values (and how those values) depending on which quantile is requested. The "median" of 4 values typically wold fall between second and third values. So one set of rules will average those two. Another set might weight based on the values, another might take one of the end points.

The P25 of 4 values would fall beween first and second, so the default SAS uses is the average , 0.29.

SAS also has a similar issue based on assigned display format. So a format that only displays two decimal places can round values. So pictures of SAS output can also be misleading.

Provide example data in the form of data step code pasted into a text box opened with the </> icon above the message window to preserve text layout, the message windows will remove white space and other things depending on code, and to differentiate code from discussion.

data have;
   input x;
datalines;
0.98
1.68
0.48
0.10
;

I tried running all 5 definitions of QNTLDEF (or PCTLDEF, same thing different name) in SAS. I do not get the result you show for Excel using any of them.

The closest I get is using this and it isn't too far from the Excel P75. But P25 is still right out.

proc means data=have qntldef=4 maxdec=3 p25 p75;
  var x;
run;

Which yields:

Analysis Variable : x

25th Pctl	75th Pctl
0.195	1.505

If you were creating output sets from Means to use print for display because you didn't like the number of decimals involved you might look into the MAXDEC option in proc means. That sets the maximum number of decimals to display.

karthikmanimca · Posted 06-16-2023 05:28 AM

Dear @ballardw,

Thanks for looking into my issue and for the possible solutions. Noted all your solutions, comments, inputs, and suggestions. However, my team is expecting the Percentile value from the SAS exactly same as Excel.

Thanks and Regards,

Karthik Mani.

PaigeMiller · Posted 06-16-2023 05:47 AM

@karthikmanimca wrote:

Thanks for looking into my issue and for the possible solutions. Noted all your solutions, comments, inputs, and suggestions. However, my team is expecting the Percentile value from the SAS exactly same as Excel.

Did you try the five different PROC UNIVARIATE options to see if they match what Excel produces?

Unfortunately, sometimes expecting Excel and SAS to match identically is going to lead to disappointment.

--
Paige Miller

karthikmanimca · Posted 06-16-2023 09:00 AM

Dear @PaigeMiller,

Thanks for the response. I am very new to Statistics and still learning. Will explore the PROC UNIVERATE procedure and its option.

Regards,

Karthik Mani.

Ksharp · Posted 06-16-2023 07:42 AM

/************
另外一种方法（SAS和R和Excel有相同的分位数）：
https://blogs.sas.com/content/iml/2021/07/26/compare-quantiles-sas-r-python.html
************/

proc iml;
/* By default, R (and Julia, and some Python packages) uses
   Hyndman and Fan's Type=7 definition. Compute Type=7 sample quantiles.
*/
start GetQuantile7(y, probs);
   x = colvec(y);
   N = nrow(x);
   if N=1 then return (y);  /* handle the degenerate case, N=1 */
 
   /* remove missing values, if any */
   idx = loc(x^=.);
   if ncol(idx)=0 then  
      return (.);           /* all values are missing */
   else if ncol(idx)<N then do;
      x = x[idx,]; N = nrow(x);  /* remove missing */
   end;
 
   /* Main computation: Compute Type=7 sample quantile. 
      Estimate is a linear interpolation between x[j] and x[j+1]. */
   call sort(x);
   p = colvec(probs);
   m = 1-p;
   j = floor(N*p + m);      /* indices into sorted data values */
   g = N*p + m - j;         /* 0 <= g <= 1 for interpolation */
 
   q = j(nrow(p), 1, x[N]); /* if p=1, estimate by x[N]=max(x) */
   idx = loc(p < 1);
   if ncol(idx) >0 then do;
      j = j[idx]; g = g[idx];
      q[idx] = (1-g)#x[j] + g#x[j+1]; /* linear interpolation */
   end;   
   return q;
finish;
 
/* Compare the SAS and R default definitions.
   The differences between definitions are most apparent 
   for small samples that have large gaps between adjacent data values. */
x = {0.1 0.48 0.98 1.68 }`;
prob = {0.25, 0.75};
 
call qntl(SASDefaultQntl, x, prob);
RDefaultQntl = GetQuantile7(x, prob);
print prob SASDefaultQntl RDefaultQntl;
quit;

karthikmanimca · Posted 06-16-2023 09:03 AM

Dear @Ksharp,

Thank you so much for taking your time and wonderful code. I feel. we have the solution and will work for my scenario. Will keep you and every updated soon.

Regards,

Karthik Mani.

Ksharp · Posted 06-16-2023 07:46 AM

Here you can get all 9 definition percentiles by Rick's code.

/*
There 9 different definitions  percentile.
Check Rick's blog:
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html

And Rick used IML code to calculate all 9 definitions  percentile.
P.S. SAS  only compute 1-5 definitions  percentile.

The following code is written by Rick.You can see EXCEL used the 7th percentile.
*/

data Q;
input x ;
datalines;
0.1
0.48
0.98
1.68
;
proc iml;
/* implement sample quantiles in Hyndman and Fan (1996) */
start SampQuantile(y, probs={0.25, 0.5, 0.75}, Type=2);
   x = colvec(y);
   p = colvec(probs);
   if element(Type, {1 2 3 4 6}) then do;
      SASType = {3 5 2 1 . 4};  /* convert from Type to QNTLDEF */
      call qntl(q, x, p, SASType[Type]);
      return q;
   end;
   if any(x=.) then 
      x = x[loc(x ^= .)]; /* remove missing values */
   call sort(x);
   N = nrow(x);
   if Type=5 then 
      return quantileM(x, p, 0.5, 1/(2*N));
   else if Type=7 then
      return quantileM(x, p, 1-p, 0);
   else if Type=8 then
      return quantileM(x, p, (p+1)/3, 2/(3*N+1));
   else if Type=9 then
      return quantileM(x, p, (2*p+3)/8, 5/(8*N+2));
finish;

start quantileM(x, p, m, c);
   N = nrow(x);       /* assume nonmissing */
   j = floor(N*p + m);
   g = n*p + m - j;
   if all(p>=c & p<1-c) then     /* usual case */
      return (1-g)#x[j] + g#x[j+1];

   /* otherwise, handle extreme values of p */
   /* When p < c, use x[1]. When p = 1 - c, use x[N] */
   q = j(nrow(p),1,.);
   idx1 = loc(p <  c);           /* set these quantiles to x[1] */
   idxN = loc(p >= 1-c);         /* set these quantiles to x[N] */
   idx  = loc(p > c & p < 1-c);
   if ncol(idx1)>0 then   q[idx1] = x[1];
   if ncol(idxN)>0 then   q[idxN] = x[N];
   if ncol(idx) >0 then do;
      j = j[idx]; g = g[idx];
      q[idx] = (1-g)#x[j] + g#x[j+1];
   end;
   return q;
finish;
store module=(SampQuantile quantileM);
quit;

/* Driver SAS/IML program to compare all 9 methods  */
proc iml;
load module=(SampQuantile);
use Q; read all var "x"; close;       
prob = T(1:999)/1000;                 * fine mesh of prob values;
create Pctls var {"Type" "quantile" "prob" "x"};
do t = 1 to 9;
   quantile = SampQuantile(x, prob, t);
   Type = j(nrow(prob), 1, t);
   append;
end;
close;
quit;
proc print data=Pctls noobs;
   where prob = 0.25;                 /* compare 0.25 quantile for different definitions */
run;

karthikmanimca · Posted 06-26-2023 10:21 PM

Dear Ksharp,

Thanks for the solution and it worked for me. Kindly keep up the great work and support for the SAS community. Looking forward to learn more from your expertise.

Yours sincerely,

Karthik Mani.

How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not matching

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

Re: How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & UNIVERATE results not

SAS Innovate 2025: Call for Content