Hi
While doing percentile calculation using proc univariate, I could not get the expected output,pls help me for the same.
PROC UNIVARIATE DATA=DATA_SET EXCLNPWGCT;
VAR VAR1;WEIGHT VAR1;
OUTPUT OUT=OUTDATASET PCTLPRE=P_PCTLPTS=12.5 25 27.5 50 62.5 75 87.5 100;
RUN;
the following are the values in the var1 column of the data_set table after excluding the zero and negative values:
19967.95
19271.69
16525.2
6885.5
3442.75
percentile
12.5
25
37.5
50
62.5
75
87.5
100
expected o/p
5164.125
6885.5
11705.35
16525.2
17898.44412
19271.68824
19619.81912
19967.95
output coming
6885.5
16525.2
16525.2
19271.68824
19271.68824
19967.95
19967.95
19967.95
Appreciate your help.Thanks
So did you create a dataset named Q with a variable named X for the IML code to read int?
data q;
input x;
cards;
19967.95
19271.69
16525.2
6885.5
3442.75
;
proc iml;
/* Define function that returns the TYPE=7 sample quantiles. For more info, see https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html*/
start GetRQuantiles(y, probs);
x = colvec(y);
call sort(x);
N = nrow(x); /* assume all values are nonmissing */
p = colvec(probs);
m = 1-p;
j = floor(N*p + m);
g = N*p + m - j;
q = j(nrow(p), 1, x[N]); /* if p=1, x[N]=return max(x) */
idx = loc(p < 1);
if ncol(idx) >0 then do;
j = j[idx]; g = g[idx];
q[idx] = (1-g)#x[j] + g#x[j+1];
end;
return q;
finish;
use q; read all var "x"; close; /* read sample into x */
p = {12.5, 25, 37.5, 50, 62.5, 75, 87.5, 100} / 100; /* define probabilities */
q = GetRQuantiles(x, p); /* sample quantiles */
print p q;
quit;
Output
The option PCTLDEF= sets which of five different methods are used to calculate percentiles. Read the documentation for the different meanings. Or just try PCTLDEF= 1 (through 4, 5 is the default method and you don't like that) and see if any of the results match expectations.
The presence of a WEIGHT variable significantly affects the results. I have to say that seeing the same variable on a VAR and WEIGHT statement makes me queasy.
to remove the zero and negative values I used weight option ,by trying PCTLDEF option also am not getting the required output
@molla wrote:
to remove the zero and negative values I used weight option ,by trying PCTLDEF option also am not getting the required output
Be specific: which PCTLDEF options did you try?
I tried PCTLDEF = 1
I have tried with other options as well but not getting the exact output
Here is an example of what sort of thing might happen when you have "weights" based on the variable you are requesting percentiles about. This uses the same input data set with and without weights.
Please look at the result and see if one of them comes closer to what you expect.
data example; do x= 1 to 25; output; end; run; title "with weight"; proc univariate data=example EXCLNPWGT; VAR x; WEIGHT x; OUTPUT OUT=OUTDATASET1 PCTLPRE=P_ PCTLPTS=12.5 25 27.5 50 62.5 75 87.5 100; RUN; title "without weight"; proc univariate data=example EXCLNPWGT; VAR x; OUTPUT OUT=OUTDATASET2 PCTLPRE=P_ PCTLPTS=12.5 25 27.5 50 62.5 75 87.5 100; RUN; title;
This is admittedly moderately "nice" data but the purpose is to illustrate that weights and percentiles are a somewhat complex issue and using the same value for the weight is problematic at best.
Try PCTLDEF = 2 and PCTLDEF=3 and PCTLDEF=4
How do you expect it to assign distinct values for the 8 different precentiles when your input only has 5 different values?
I edited your meaningless subject line. Please use a more descriptive one next time.
The percentiles you list are Definition #7 in the Hyndman and Fan (1997) lists of nine definitions for sample quantiles. This is the default in R. SAS does not natively support Definition #7, but you can use a formula to obtain it. The article "Sample quantiles: A comparison of 9 definitions" provides the details and SAS code to compute all nine definitions.
The article contains the function definitions. Here's how you get your results:
proc iml;
/* get module from program at https://blogs.sas.com/content/iml/files/2017/05/QntlDef.txt */
load module=(SampQuantile);
use Q; read all var "x"; close;
prob = {12.5,25,37.5,50,62.5,75,87.5,100} / 100;
def = 7;
quantile = SampQuantile(x, prob, def);
print def prob quantile;
Be sure the read the end of my article to understand that these definitions are all estimates for the underlying quantiles of the data-generating distribution. None is "more correct" than any other. As the samples size gets larger, the definitions all give similar estimates.
OK, here's a simpler suggestion: use the default estimates from PROC UNIVARIATE or another SAS procedure.
SAS procedures provide five ways to estimate quantiles. There are other estimates, as discussed in my article. But all of these are valid estimates and, in practice, there is rarely a reason to prefer one estimate over the other.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.