Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to replicate the PERCENTILE(array,k) results in SAS. PROC MEANS & ...

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

☑ This topic is **solved**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-16-2023 03:22 AM
(1298 views)

Dear Team,

I am new to statistics field. In my current assignment, I need to replicate the manual excel work with SAS excel output. **I am facing below challenge, on getting the excel percentile value in SAS output.**

My dataset is having a column named **Sample **with below 4 values. I am trying to get the 25th and 75th percentile value for the **Sample** column. Presently, team is deriving the 25th percentile using Excel function PERCENTILE(A2:A5,.25), and PERCENTILE(A2:A5,.75).

Tried simple PROC MEANS and PROC UNIVERATE in SAS. Please find the code and outputs below,

**Code and Outputs:**

PROC PRINT DATA=DEMO;

TITLE 'Output of Demo Dataset';

RUN;

PROC MEANS DATA=WORK.DEMO P25 P75 NOPRINT;

VAR SAMPLE;

OUTPUT OUT = C_MEANS

P25 = C_P25

P75 = C_P75;

RUN;

PROC UNIVARIATE DATA=WORK.DEMO NOPRINT;

VAR SAMPLE;

OUTPUT OUT =C_UNIV

PCTLPTS = 25, 75

PCTLPRE = P_;

RUN;

Sincere Thanks in advance.

Karthik Mani.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here you can get all 9 definition percentiles by Rick's code.

```
/*
There 9 different definitions percentile.
Check Rick's blog:
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html
And Rick used IML code to calculate all 9 definitions percentile.
P.S. SAS only compute 1-5 definitions percentile.
The following code is written by Rick.You can see EXCEL used the 7th percentile.
*/
data Q;
input x ;
datalines;
0.1
0.48
0.98
1.68
;
proc iml;
/* implement sample quantiles in Hyndman and Fan (1996) */
start SampQuantile(y, probs={0.25, 0.5, 0.75}, Type=2);
x = colvec(y);
p = colvec(probs);
if element(Type, {1 2 3 4 6}) then do;
SASType = {3 5 2 1 . 4}; /* convert from Type to QNTLDEF */
call qntl(q, x, p, SASType[Type]);
return q;
end;
if any(x=.) then
x = x[loc(x ^= .)]; /* remove missing values */
call sort(x);
N = nrow(x);
if Type=5 then
return quantileM(x, p, 0.5, 1/(2*N));
else if Type=7 then
return quantileM(x, p, 1-p, 0);
else if Type=8 then
return quantileM(x, p, (p+1)/3, 2/(3*N+1));
else if Type=9 then
return quantileM(x, p, (2*p+3)/8, 5/(8*N+2));
finish;
start quantileM(x, p, m, c);
N = nrow(x); /* assume nonmissing */
j = floor(N*p + m);
g = n*p + m - j;
if all(p>=c & p<1-c) then /* usual case */
return (1-g)#x[j] + g#x[j+1];
/* otherwise, handle extreme values of p */
/* When p < c, use x[1]. When p = 1 - c, use x[N] */
q = j(nrow(p),1,.);
idx1 = loc(p < c); /* set these quantiles to x[1] */
idxN = loc(p >= 1-c); /* set these quantiles to x[N] */
idx = loc(p > c & p < 1-c);
if ncol(idx1)>0 then q[idx1] = x[1];
if ncol(idxN)>0 then q[idxN] = x[N];
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
store module=(SampQuantile quantileM);
quit;
/* Driver SAS/IML program to compare all 9 methods */
proc iml;
load module=(SampQuantile);
use Q; read all var "x"; close;
prob = T(1:999)/1000; * fine mesh of prob values;
create Pctls var {"Type" "quantile" "prob" "x"};
do t = 1 to 9;
quantile = SampQuantile(x, prob, t);
Type = j(nrow(prob), 1, t);
append;
end;
close;
quit;
proc print data=Pctls noobs;
where prob = 0.25; /* compare 0.25 quantile for different definitions */
run;
```

8 REPLIES 8

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

If you read the documentation for Proc Means you will see that there are actual several different settings that related to Percentile calculations. Look at the option QNTLDEF on this page: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/n1qnc9bddfvhzqn105kqitnf29cp.htm

for Methods Computing Quantile Statistics. Some of these options involve how ties are broken as

Showing pictures of spreadsheets is misleading. One, it is not actual values but a picture so unless we retype everything we can't actually use anything shown. Second we don't see the actual formula , as in actual cells selected or any other options. Third is we do not know what precision of values may actually exist for use. Consider that Excel typically defaults to two decimal places when assigning the cell as numeric. A value that displays as 0.48 might actually be 0.481 or 0.479 (or quite a range actually). So the values shown are just approximations of just what they may be. I don't know, and really don't care, which default algorithm Excel uses but it is pretty obviously different for determining quantiles.

The typical "beginning" definition of a quantile is the order or between two values (and how those values) depending on which quantile is requested. The "median" of 4 values typically wold fall between second and third values. So one set of rules will average those two. Another set might weight based on the values, another might take one of the end points.

The P25 of 4 values would fall beween first and second, so the default SAS uses is the average , 0.29.

SAS also has a similar issue based on assigned display format. So a format that only displays two decimal places can round values. So pictures of SAS output can also be misleading.

Provide example data in the form of data step code pasted into a text box opened with the </> icon above the message window to preserve text layout, the message windows will remove white space and other things depending on code, and to differentiate code from discussion.

data have; input x; datalines; 0.98 1.68 0.48 0.10 ;

I tried running all 5 definitions of QNTLDEF (or PCTLDEF, same thing different name) in SAS. I do not get the result you show for Excel using any of them.

The closest I get is using this and it isn't too far from the Excel P75. But P25 is still right out.

proc means data=have qntldef=4 maxdec=3 p25 p75; var x; run;

Which yields:

Analysis Variable : x | |
---|---|

25th Pctl | 75th Pctl |

0.195 | 1.505 |

If you were creating output sets from Means to use print for display because you didn't like the number of decimals involved you might look into the MAXDEC option in proc means. That sets the maximum number of decimals to display.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Dear @ballardw,

Thanks for looking into my issue and for the possible solutions. Noted all your solutions, comments, inputs, and suggestions. However, my team is expecting the Percentile value from the SAS exactly same as Excel.

Thanks and Regards,

Karthik Mani.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@karthikmanimca wrote:

Thanks for looking into my issue and for the possible solutions. Noted all your solutions, comments, inputs, and suggestions. However, my team is expecting the Percentile value from the SAS exactly same as Excel.

Did you try the five different PROC UNIVARIATE options to see if they match what Excel produces?

Unfortunately, sometimes expecting Excel and SAS to match identically is going to lead to disappointment.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Dear @PaigeMiller,

Thanks for the response. I am very new to Statistics and still learning. Will explore the PROC UNIVERATE procedure and its option.

Regards,

Karthik Mani.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

```
/************
另外一种方法（SAS和R和Excel有相同的分位数）：
https://blogs.sas.com/content/iml/2021/07/26/compare-quantiles-sas-r-python.html
************/
proc iml;
/* By default, R (and Julia, and some Python packages) uses
Hyndman and Fan's Type=7 definition. Compute Type=7 sample quantiles.
*/
start GetQuantile7(y, probs);
x = colvec(y);
N = nrow(x);
if N=1 then return (y); /* handle the degenerate case, N=1 */
/* remove missing values, if any */
idx = loc(x^=.);
if ncol(idx)=0 then
return (.); /* all values are missing */
else if ncol(idx)<N then do;
x = x[idx,]; N = nrow(x); /* remove missing */
end;
/* Main computation: Compute Type=7 sample quantile.
Estimate is a linear interpolation between x[j] and x[j+1]. */
call sort(x);
p = colvec(probs);
m = 1-p;
j = floor(N*p + m); /* indices into sorted data values */
g = N*p + m - j; /* 0 <= g <= 1 for interpolation */
q = j(nrow(p), 1, x[N]); /* if p=1, estimate by x[N]=max(x) */
idx = loc(p < 1);
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1]; /* linear interpolation */
end;
return q;
finish;
/* Compare the SAS and R default definitions.
The differences between definitions are most apparent
for small samples that have large gaps between adjacent data values. */
x = {0.1 0.48 0.98 1.68 }`;
prob = {0.25, 0.75};
call qntl(SASDefaultQntl, x, prob);
RDefaultQntl = GetQuantile7(x, prob);
print prob SASDefaultQntl RDefaultQntl;
quit;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Dear @Ksharp,

Thank you so much for taking your time and wonderful code. I feel. we have the solution and will work for my scenario. Will keep you and every updated soon.

Regards,

Karthik Mani.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Here you can get all 9 definition percentiles by Rick's code.

```
/*
There 9 different definitions percentile.
Check Rick's blog:
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html
And Rick used IML code to calculate all 9 definitions percentile.
P.S. SAS only compute 1-5 definitions percentile.
The following code is written by Rick.You can see EXCEL used the 7th percentile.
*/
data Q;
input x ;
datalines;
0.1
0.48
0.98
1.68
;
proc iml;
/* implement sample quantiles in Hyndman and Fan (1996) */
start SampQuantile(y, probs={0.25, 0.5, 0.75}, Type=2);
x = colvec(y);
p = colvec(probs);
if element(Type, {1 2 3 4 6}) then do;
SASType = {3 5 2 1 . 4}; /* convert from Type to QNTLDEF */
call qntl(q, x, p, SASType[Type]);
return q;
end;
if any(x=.) then
x = x[loc(x ^= .)]; /* remove missing values */
call sort(x);
N = nrow(x);
if Type=5 then
return quantileM(x, p, 0.5, 1/(2*N));
else if Type=7 then
return quantileM(x, p, 1-p, 0);
else if Type=8 then
return quantileM(x, p, (p+1)/3, 2/(3*N+1));
else if Type=9 then
return quantileM(x, p, (2*p+3)/8, 5/(8*N+2));
finish;
start quantileM(x, p, m, c);
N = nrow(x); /* assume nonmissing */
j = floor(N*p + m);
g = n*p + m - j;
if all(p>=c & p<1-c) then /* usual case */
return (1-g)#x[j] + g#x[j+1];
/* otherwise, handle extreme values of p */
/* When p < c, use x[1]. When p = 1 - c, use x[N] */
q = j(nrow(p),1,.);
idx1 = loc(p < c); /* set these quantiles to x[1] */
idxN = loc(p >= 1-c); /* set these quantiles to x[N] */
idx = loc(p > c & p < 1-c);
if ncol(idx1)>0 then q[idx1] = x[1];
if ncol(idxN)>0 then q[idxN] = x[N];
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
store module=(SampQuantile quantileM);
quit;
/* Driver SAS/IML program to compare all 9 methods */
proc iml;
load module=(SampQuantile);
use Q; read all var "x"; close;
prob = T(1:999)/1000; * fine mesh of prob values;
create Pctls var {"Type" "quantile" "prob" "x"};
do t = 1 to 9;
quantile = SampQuantile(x, prob, t);
Type = j(nrow(prob), 1, t);
append;
end;
close;
quit;
proc print data=Pctls noobs;
where prob = 0.25; /* compare 0.25 quantile for different definitions */
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Dear Ksharp,

Thanks for the solution and it worked for me. Kindly keep up the great work and support for the SAS community. Looking forward to learn more from your expertise.

Yours sincerely,

Karthik Mani.

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.