BookmarkSubscribeRSS Feed
twildone
Pyrite | Level 9

Hi....I have a dataset that I am  trying to select a certain percentile of records based on their Fees for each YearMonth and within each Level. For example, from the table below, for the YearMonth = 201501 and Level=A, I would like to select and keep 93% of the records with the lowest fees and exclude the remaining 7% of the records with the highest fees. That would mean if there were 1,000 records having YearMonth = 201501 and Level =A, then I am expecting to have 930 records with YearMonth = 201501 and Level =A while the remaining 70 are excluded from the new dataset. I have sorted the dataset by YearMonth, Level and Fees. I was wondering what be the best and most efficient approach to achiev this. Thanks.

 

YearMonth Level Percentile
201501 A 0.93
201501 B 0.95
201502 A 0.92
201503 A 0.92
201503 B 0.91
201503 C 0.94
5 REPLIES 5
Reeza
Super User

Do you have the percentiles already calculated?

If not, use PROC RANK with groups=100 and from the output select the desired records. I think it would be rank<92 because ranks go from 0 to 99. 

 

 

twildone
Pyrite | Level 9

Hi Reeza,

 

Yes the percentiles are already calculated and are specific for each unique YearMonth and Level.

Reeza
Super User

If the percentiles are already calculated isn't that just a WHERE or IF statement?

 

If not for some reason, detail your issue.

ballardw
Super User

Another approach would be to use Proc Surveyselect. You can use a data set with the percentiles as a control data set but the Percentile value must be in a variable named _RATE_.

The procedure call would look something like this:

proc survey select data=have out=want 
    samprate=Percentileset;
    by Yearmonth level;
run;

where Percentileset would be the name of the data set with YearMonth, Level and _rate.

 

Both your full dataset (fees?) and Percentileset would need to be sorted by Yearmonth and level.

This will give you a random selection of records.

twildone
Pyrite | Level 9

Hi Ballardw.....unfortunately I need to select the lowest X percentile of the records for each YearMonth and Level and not a random sample....unless I am missing something from your approach.

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1427 views
  • 0 likes
  • 3 in conversation