- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello
I want to calculate following percentiles and then classify each row into a group.
I want to define groups (Ranks) based on following percentiles-
P5,P10,P20,P30,P40,P50,P60,P70,P80,P90,P95
The following code create the percentiles-P10,P20,P30,P40,P50,P60,P70,P80,P90 but no P5,P95
What is the way to add them???
proc rank data=sashelp.cars out=have groups=10;
var invoice;
ranks rank;
run;
proc sql;
create table rank_groups_Definition as
select rank,
min(invoice) as min,
max(invoice) as max
from have
group by rank
;
quit;
data rank_groups_Definition_2;
set rank_groups_Definition;
_MIN_=lag(max)+1;
IF _MIN_=. then _MIN_=0;
drop min;
rename _min_=min;
Run;
proc sql;
create table want as
select a.*,b.rank
from sashelp.cars as a
left join rank_groups_Definition_2 as b
on b.Min<=a.invoice<=b.Max
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc rank data=sashelp.cars out=have fraction;
var invoice;
ranks rank;
run;
proc format;
value percentile 0-<0.05="< 5 percentile"
0.05-<0.1='5-10 percentile'
0.1-<0.2='10-20 percentile'
/* I'm lazy, you type the rest */
...
;
run;
proc datasets library=work nolist;
modify have;
format rank percentile.;
run; quit;
Also, stay away from mathematical analyses in SQL steps when possible. In the long run, SQL is a poor choice for math. Just because you can do things in SQL does not mean you should do things in SQL. In this case and many others, PROC FORMAT is a better solution. Using PROC FORMAT allows you to put labels on the group that actually mean something to humans and it keeps the variable numeric (instead of 'P5' 'P10' etc.) so it will sort numerically in tables (instead of sorting alphabetically).
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Split your data into 100 groups. And recode it as your wish.
proc rank data=sashelp.cars out=have groups=100;
var invoice;
ranks rank;
run;
data have;
set have;
if rank<=4 then want='P5 ';
else if rank<=9 then want='P10 ';
else if rank<=19 then want='P20 ';
.........................
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am using it on my data set and I dont get the whole percentiles that I ask .
proc rank data=temp(Where=(revenue>0)) out=want groups=100;
var revenue;
ranks rank;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Ronein wrote:
I am using it on my data set and I dont get the whole percentiles that I ask .
What does that mean?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Show the log.
If you don't have more than at least 100 non-missing non-tied values in the input set you are not going to get 100 ranks. Period.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
proc rank data=sashelp.cars out=have groups=20;
var invoice;
ranks rank;
run;
data have;
set have;
if rank=0 then want='P5 ';
else if rank=1 then want='P10 ';
else if rank in (2 3) then want='P20 ';
.........................