Solved: Re: Percentiles with PROC UNIVARIATE

molla · Posted 07-09-2021 03:54 AM

Hi

While doing percentile calculation using proc univariate, I could not get the expected output,pls help me for the same.

PROC UNIVARIATE DATA=DATA_SET EXCLNPWGCT;
VAR VAR1;WEIGHT VAR1;
OUTPUT OUT=OUTDATASET PCTLPRE=P_PCTLPTS=12.5 25 27.5 50 62.5 75 87.5 100;
RUN;

the following are the values in the var1 column of the data_set table after excluding the zero and negative values:

19967.95
19271.69
16525.2
6885.5
3442.75

percentile
12.5
25
37.5
50
62.5
75
87.5
100

expected o/p
5164.125
6885.5
11705.35
16525.2
17898.44412
19271.68824
19619.81912
19967.95

output coming
6885.5
16525.2
16525.2
19271.68824
19271.68824
19967.95
19967.95
19967.95

Appreciate your help.Thanks

Tom · Posted 07-12-2021 03:42 PM

So did you create a dataset named Q with a variable named X for the IML code to read int?

data q;
  input x;
cards;
19967.95
19271.69
16525.2
6885.5
3442.75
;



proc iml;
/* Define function that returns the TYPE=7 sample quantiles. For more info, see   https://blogs.sas.com/content/iml/2017/05/24/definitions-sample-quantiles.html*/
start GetRQuantiles(y, probs);
   x = colvec(y);
   call sort(x);
   N = nrow(x);       /* assume all values are nonmissing */
   
   p = colvec(probs);
   m = 1-p;
   j = floor(N*p + m);
   g = N*p + m - j;

   q = j(nrow(p), 1, x[N]);    /* if p=1, x[N]=return max(x) */
   idx = loc(p < 1);
   if ncol(idx) >0 then do;
      j = j[idx]; g = g[idx];
      q[idx] = (1-g)#x[j] + g#x[j+1];
   end;   
   return q;
finish;

use q; read all var "x"; close;       /* read sample into x */
p = {12.5, 25, 37.5, 50, 62.5, 75, 87.5, 100} / 100;  /* define probabilities */
q = GetRQuantiles(x, p);  /* sample quantiles */
print p q;

quit;

Output

View solution in original post

ballardw · Posted 07-09-2021 04:08 AM

The option PCTLDEF= sets which of five different methods are used to calculate percentiles. Read the documentation for the different meanings. Or just try PCTLDEF= 1 (through 4, 5 is the default method and you don't like that) and see if any of the results match expectations.

The presence of a WEIGHT variable significantly affects the results. I have to say that seeing the same variable on a VAR and WEIGHT statement makes me queasy.

molla · Posted 07-09-2021 06:40 AM

to remove the zero and negative values I used weight option ,by trying PCTLDEF option also am not getting the required output

PaigeMiller · Posted 07-09-2021 07:11 AM

@molla wrote:
to remove the zero and negative values I used weight option ,by trying PCTLDEF option also am not getting the required output

Be specific: which PCTLDEF options did you try?

--
Paige Miller

molla · Posted 07-09-2021 07:59 AM

I tried PCTLDEF = 1

molla · Posted 07-09-2021 10:26 AM

I have tried with other options as well but not getting the exact output

ballardw · Posted 07-09-2021 12:50 PM

Here is an example of what sort of thing might happen when you have "weights" based on the variable you are requesting percentiles about. This uses the same input data set with and without weights.

Please look at the result and see if one of them comes closer to what you expect.

data example;
  do x= 1 to 25;
  output;
  end;
run;

title "with weight";
proc univariate data=example EXCLNPWGT;
 VAR x;
 WEIGHT x;
 OUTPUT OUT=OUTDATASET1 PCTLPRE=P_ PCTLPTS=12.5 25 27.5 50 62.5 75 87.5 100;
RUN;

title "without weight";
proc univariate data=example EXCLNPWGT;
 VAR x;
 OUTPUT OUT=OUTDATASET2 PCTLPRE=P_ PCTLPTS=12.5 25 27.5 50 62.5 75 87.5 100;
RUN;
title;

This is admittedly moderately "nice" data but the purpose is to illustrate that weights and percentiles are a somewhat complex issue and using the same value for the weight is problematic at best.

PaigeMiller · Posted 07-09-2021 10:44 AM

Try PCTLDEF = 2 and PCTLDEF=3 and PCTLDEF=4

--
Paige Miller

molla · Posted 07-12-2021 03:20 AM

tried bt not getting the accurate results

Tom · Posted 07-09-2021 10:48 AM

How do you expect it to assign distinct values for the 8 different precentiles when your input only has 5 different values?

Kurt_Bremser · Posted 07-09-2021 01:26 PM

I edited your meaningless subject line. Please use a more descriptive one next time.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Rick_SAS · Posted 07-09-2021 04:33 PM

The percentiles you list are Definition #7 in the Hyndman and Fan (1997) lists of nine definitions for sample quantiles. This is the default in R. SAS does not natively support Definition #7, but you can use a formula to obtain it. The article "Sample quantiles: A comparison of 9 definitions" provides the details and SAS code to compute all nine definitions.

The article contains the function definitions. Here's how you get your results:

proc iml;
/* get module from program at https://blogs.sas.com/content/iml/files/2017/05/QntlDef.txt */
load module=(SampQuantile);
use Q; read all var "x"; close;       
prob = {12.5,25,37.5,50,62.5,75,87.5,100} / 100;
def = 7;
quantile = SampQuantile(x, prob, def);
print def prob quantile;

Be sure the read the end of my article to understand that these definitions are all estimates for the underlying quantiles of the data-generating distribution. None is "more correct" than any other. As the samples size gets larger, the definitions all give similar estimates.

molla · Posted 07-12-2021 03:19 AM

Hi,
Its some what complex,not able to understand can we do the same in proc freq or proc mean?
If yes pls let me know how to do it

Rick_SAS · Posted 07-12-2021 06:05 AM

OK, here's a simpler suggestion: use the default estimates from PROC UNIVARIATE or another SAS procedure.

SAS procedures provide five ways to estimate quantiles. There are other estimates, as discussed in my article. But all of these are valid estimates and, in practice, there is rarely a reason to prefer one estimate over the other.

molla · Posted 07-12-2021 09:45 AM

By using the five ways of sas to estimate quantiles am not getting the accurate output,not able to understand how to implement the way which u have explained

Percentiles with PROC UNIVARIATE

Re: Percentiles with PROC UNIVARIATE

Re: SAS PROGRAMMING

Re: Percentiles with PROC UNIVARIATE

Re: SAS PROGRAMMING

Re: Percentiles with PROC UNIVARIATE

Re: Percentiles with PROC UNIVARIATE

Re: SAS PROGRAMMING

Re: SAS PROGRAMMING

Re: SAS PROGRAMMING

Re: SAS PROGRAMMING

Re: Percentiles with PROC UNIVARIATE

Re: Percentiles with PROC UNIVARIATE

Re: Percentiles with PROC UNIVARIATE

Re: Percentiles with PROC UNIVARIATE

Re: Percentiles with PROC UNIVARIATE

Ready to join fellow brilliant minds for the SAS Hackathon?