BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JacAder
Obsidian | Level 7

In the sample data set, BB and CC are derived from AA (or put it another way, BB and CC are dependent on AA). What I try to do is to compute the selected percentiles of AA, and also save the corresponding BB and CC at each reported percentile of AA (BB and CC should be always match to AA). However, the code below compute the percentiles independently, e.g., the percentiles of BB and CC are not match to the AA.

proc sort data=sample;
by AA;
run;

proc univariate data=sample noprint;
  var AA BB CC;
  output out=percentiles pctlpre=AA BB CC pctlpts= 1 to 5 by 1, 10 to 90 by 10, 95 to 99 by 1;
run;

proc transpose data=percentiles out=percentiles_trans;
var _all_;
run;

proc sort SORTSEQ=LINGUISTIC(NUMERIC_COLLATION=ON);
by _NAME_;
run;

I also tried to compute the percentile by sorting the variable AA, and the percentile is calculated as ( _n_ /total number of observation ).
In this way,  BB and CC are always matched to AA; but because the number of observations is less than 100, I cannot find all percentiles I want.

proc sort data=sample;
by AA;
run;

data sample;
set sample;
pct=100*_n_/47;
percentile=floor(pct);
run;

data sample ;
set sample;
by percentile;
if first.percentile;
if percentile in (1,2,3,4,5,10,20,30,40,50,60,70,80,90,95,96,97,98,99);
run;

Appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @JacAder,

 

You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:

data want;
merge percentiles_trans(rename=(col1=AA) in=p)
      sample;
by AA;
if p;
run;

Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.

View solution in original post

2 REPLIES 2
FreelanceReinh
Jade | Level 19

Hello @JacAder,

 

You can omit BB and CC in your PROC UNIVARIATE step and merge the resulting percentiles_trans dataset (which is then already sorted properly according to the PCTLPTS= specification) with the sorted original dataset in order to add the matching BB and CC values:

data want;
merge percentiles_trans(rename=(col1=AA) in=p)
      sample;
by AA;
if p;
run;

Please note, however, that percentiles using the default percentile definition (PCTLDEF=5) are not necessarily existing values from the input dataset: For example, the median of {8, 9} is 8.5 by that definition. So you would need to decide how to handle such cases or switch to a different percentile definition (using the option PCTLDEF=2 or PCTLDEF=3 in the PROC UNIVARIATE statement) which always results in values from the input dataset.

JacAder
Obsidian | Level 7
Thank you so much for the solution!

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 953 views
  • 0 likes
  • 2 in conversation