BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Obsidian | Level 7

## Percentile of several variables

In the sample data set, BB and CC are derived from AA (or put it another way, BB and CC are dependent on AA). What I try to do is to compute the selected percentiles of AA, and also save the corresponding BB and CC at each reported percentile of AA (BB and CC should be always match to AA). However, the code below compute the percentiles independently, e.g., the percentiles of BB and CC are not match to the AA.

``````proc sort data=sample;
by AA;
run;

proc univariate data=sample noprint;
var AA BB CC;
output out=percentiles pctlpre=AA BB CC pctlpts= 1 to 5 by 1, 10 to 90 by 10, 95 to 99 by 1;
run;

proc transpose data=percentiles out=percentiles_trans;
var _all_;
run;

proc sort SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON);
by _NAME_;
run;``````

I also tried to compute the percentile by sorting the variable AA, and the percentile is calculated as ( _n_ /total number of observation ).
In this way,  BB and CC are always matched to AA; but because the number of observations is less than 100, I cannot find all percentiles I want.

``````proc sort data=sample;
by AA;
run;

data sample;
set sample;
pct=100*_n_/47;
percentile=floor(pct);
run;

data sample ;
set sample;
by percentile;
if first.percentile;
if percentile in (1,2,3,4,5,10,20,30,40,50,60,70,80,90,95,96,97,98,99);
run;``````

Appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
Jade | Level 19

## Re: Percentile of several variables

Hello @JacAder,

You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:

``````data want;
merge percentiles_trans(rename=(col1=AA) in=p)
sample;
by AA;
if p;
run;``````

Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.

2 REPLIES 2
Jade | Level 19

## Re: Percentile of several variables

Hello @JacAder,

You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:

``````data want;
merge percentiles_trans(rename=(col1=AA) in=p)
sample;
by AA;
if p;
run;``````

Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.

Obsidian | Level 7

## Re: Percentile of several variables

Thank you so much for the solution!
Discussion stats
• 2 replies
• 610 views
• 0 likes
• 2 in conversation