Solved: Percentile of several variables

JacAder · Posted 06-13-2020 12:46 AM

In the sample data set, BB and CC are derived from AA (or put it another way, BB and CC are dependent on AA). What I try to do is to compute the selected percentiles of AA, and also save the corresponding BB and CC at each reported percentile of AA (BB and CC should be always match to AA). However, the code below compute the percentiles independently, e.g., the percentiles of BB and CC are not match to the AA.

proc sort data=sample;
by AA;
run;

proc univariate data=sample noprint;
  var AA BB CC;
  output out=percentiles pctlpre=AA BB CC pctlpts= 1 to 5 by 1, 10 to 90 by 10, 95 to 99 by 1;
run;

proc transpose data=percentiles out=percentiles_trans;
var _all_;
run;

proc sort SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON);
by _NAME_;
run;

I also tried to compute the percentile by sorting the variable AA, and the percentile is calculated as ( _n_ /total number of observation ).
In this way, BB and CC are always matched to AA; but because the number of observations is less than 100, I cannot find all percentiles I want.

proc sort data=sample;
by AA;
run;

data sample;
set sample;
pct=100*_n_/47;
percentile=floor(pct);
run;

data sample ;
set sample;
by percentile;
if first.percentile;
if percentile in (1,2,3,4,5,10,20,30,40,50,60,70,80,90,95,96,97,98,99);
run;

Appreciated!

FreelanceReinh · Posted 06-13-2020 04:57 AM

Hello @JacAder,

You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:

data want;
merge percentiles_trans(rename=(col1=AA) in=p)
      sample;
by AA;
if p;
run;

Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.

View solution in original post

FreelanceReinh · Posted 06-13-2020 04:57 AM

Hello @JacAder,

You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:

data want;
merge percentiles_trans(rename=(col1=AA) in=p)
      sample;
by AA;
if p;
run;

Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.

JacAder · Posted 06-16-2020 07:28 PM

Thank you so much for the solution!

Percentile of several variables

Re: Percentile of several variables

Re: Percentile of several variables

Re: Percentile of several variables

Catch up on SAS Innovate 2026