In the sample data set, BB and CC are derived from AA (or put it another way, BB and CC are dependent on AA). What I try to do is to compute the selected percentiles of AA, and also save the corresponding BB and CC at each reported percentile of AA (BB and CC should be always match to AA). However, the code below compute the percentiles independently, e.g., the percentiles of BB and CC are not match to the AA.
proc sort data=sample;
by AA;
run;
proc univariate data=sample noprint;
var AA BB CC;
output out=percentiles pctlpre=AA BB CC pctlpts= 1 to 5 by 1, 10 to 90 by 10, 95 to 99 by 1;
run;
proc transpose data=percentiles out=percentiles_trans;
var _all_;
run;
proc sort SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON);
by _NAME_;
run;
I also tried to compute the percentile by sorting the variable AA, and the percentile is calculated as ( _n_ /total number of observation ).
In this way, BB and CC are always matched to AA; but because the number of observations is less than 100, I cannot find all percentiles I want.
proc sort data=sample;
by AA;
run;
data sample;
set sample;
pct=100*_n_/47;
percentile=floor(pct);
run;
data sample ;
set sample;
by percentile;
if first.percentile;
if percentile in (1,2,3,4,5,10,20,30,40,50,60,70,80,90,95,96,97,98,99);
run;
Appreciated!
Hello @JacAder,
You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:
data want;
merge percentiles_trans(rename=(col1=AA) in=p)
sample;
by AA;
if p;
run;
Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.
Hello @JacAder,
You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:
data want;
merge percentiles_trans(rename=(col1=AA) in=p)
sample;
by AA;
if p;
run;
Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.