Hello
I want to calculate following percentiles and then classify each row into a group.
I want to define groups (Ranks) based on following percentiles-
P5,P10,P20,P30,P40,P50,P60,P70,P80,P90,P95
The following code create the percentiles-P10,P20,P30,P40,P50,P60,P70,P80,P90 but no P5,P95
What is the way to add them???
proc rank data=sashelp.cars out=have groups=10;
var invoice;
ranks rank;
run;
proc sql;
create table rank_groups_Definition as
select rank,
min(invoice) as min,
max(invoice) as max
from have
group by rank
;
quit;
data rank_groups_Definition_2;
set rank_groups_Definition;
_MIN_=lag(max)+1;
IF _MIN_=. then _MIN_=0;
drop min;
rename _min_=min;
Run;
proc sql;
create table want as
select a.*,b.rank
from sashelp.cars as a
left join rank_groups_Definition_2 as b
on b.Min<=a.invoice<=b.Max
;
quit;
proc rank data=sashelp.cars out=have fraction;
var invoice;
ranks rank;
run;
proc format;
value percentile 0-<0.05="< 5 percentile"
0.05-<0.1='5-10 percentile'
0.1-<0.2='10-20 percentile'
/* I'm lazy, you type the rest */
...
;
run;
proc datasets library=work nolist;
modify have;
format rank percentile.;
run; quit;
Also, stay away from mathematical analyses in SQL steps when possible. In the long run, SQL is a poor choice for math. Just because you can do things in SQL does not mean you should do things in SQL. In this case and many others, PROC FORMAT is a better solution. Using PROC FORMAT allows you to put labels on the group that actually mean something to humans and it keeps the variable numeric (instead of 'P5' 'P10' etc.) so it will sort numerically in tables (instead of sorting alphabetically).
Split your data into 100 groups. And recode it as your wish.
proc rank data=sashelp.cars out=have groups=100;
var invoice;
ranks rank;
run;
data have;
set have;
if rank<=4 then want='P5 ';
else if rank<=9 then want='P10 ';
else if rank<=19 then want='P20 ';
.........................
I am using it on my data set and I dont get the whole percentiles that I ask .
proc rank data=temp(Where=(revenue>0)) out=want groups=100;
var revenue;
ranks rank;
run;
@Ronein wrote:
I am using it on my data set and I dont get the whole percentiles that I ask .
What does that mean?
Show the log.
If you don't have more than at least 100 non-missing non-tied values in the input set you are not going to get 100 ranks. Period.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.