Let me say it again: quantiles are statistics, which means that they estimate underlying parameters in the population (in this case, the quantiles of the population). The values you say are "correct" are merely the sample quantiles that R computes by default. There are many definitions of sample quantiles. None are more correct than the others.
The following SAS/IML program simplifies the program in my blog post and computes only the TYPE=7 definition, which is the default in R. You can run this program to obtain the sample quantiles that you want.
/* Compute the sample quantiles that R computes by default */
proc iml;
/* Define function that returns the TYPE=7 sample quantiles. For more info, see
https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html
*/
start GetRQuantiles(y, probs);
x = colvec(y);
call sort(x);
N = nrow(x); /* assume all values are nonmissing */
p = colvec(probs);
m = 1-p;
j = floor(N*p + m);
g = N*p + m - j;
q = j(nrow(p), 1, x[N]); /* if p=1, x[N]=return max(x) */
idx = loc(p < 1);
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
use Q; read all var "x"; close; /* read sample into x */
p = {12.5, 25, 37.5, 50, 62.5, 75, 87.5, 100} / 100; /* define probabilities */
q = GetRQuantiles(x, p); /* sample quantiles */
print p q;
So did you create a dataset named Q with a variable named X for the IML code to read int?
data q;
input x;
cards;
19967.95
19271.69
16525.2
6885.5
3442.75
;
proc iml;
/* Define function that returns the TYPE=7 sample quantiles. For more info, see https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html*/
start GetRQuantiles(y, probs);
x = colvec(y);
call sort(x);
N = nrow(x); /* assume all values are nonmissing */
p = colvec(probs);
m = 1-p;
j = floor(N*p + m);
g = N*p + m - j;
q = j(nrow(p), 1, x[N]); /* if p=1, x[N]=return max(x) */
idx = loc(p < 1);
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
use q; read all var "x"; close; /* read sample into x */
p = {12.5, 25, 37.5, 50, 62.5, 75, 87.5, 100} / 100; /* define probabilities */
q = GetRQuantiles(x, p); /* sample quantiles */
print p q;
quit;
Output
For a degenerate sample that has one observation, the empirical CDF is a vertical line and all quantiles are equal to the value of the observation. For example, if the sample is {7}, then
min = 0th percentile = 7
10th percentile = 7
...
90th percentile = 7
max = 100th percentile = 7
I am trying to complete a project, so I will let others help you. Good luck!
As Rick stated you can use the normal SAS procedures to calculate normal estimates of quantiles.
If you want some special algorithm then you can write your own code to perform it, like the way Rick showed with the IML code to reproduce the numbers the R function you used was producing.
The OP has complained that the original program I posted did not handle the degenerate case of a sample that has only one observation. Here is the modification that handles N=1:
data q;
input x;
cards;
19967.95
19271.69
16525.2
6885.5
3442.75
;
proc iml;
/* Define function that returns the TYPE=7 sample quantiles. For more info, see https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html*/
start GetRQuantiles(y, probs);
x = colvec(y);
call sort(x);
N = nrow(x); /* assume all values are nonmissing */
p = colvec(probs);
m = 1-p;
j = floor(N*p + m);
g = N*p + m - j;
q = j(nrow(p), 1, x[N]); /* if p=1, x[N]=return max(x) */
if N=1 then return q;
idx = loc(p < 1);
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
/* TEST the function on an example */
use q; read all var "x"; close; /* read sample into x */
p = {12.5, 25, 37.5, 50, 62.5, 75, 87.5, 100} / 100; /* define probabilities */
q = GetRQuantiles(x, p); /* get type=7 sample quantiles */
print p q;
/* for a degenerate sample (N=1), all estimates are equal to x[1] */
q = GetRQuantiles(123.45, p);
print p q;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.