BookmarkSubscribeRSS Feed
twildone
Pyrite | Level 9

Hi....I have a dataset that I am  trying to select a certain percentile of records based on their Fees for each YearMonth and within each Level. For example, from the table below, for the YearMonth = 201501 and Level=A, I would like to select and keep 93% of the records with the lowest fees and exclude the remaining 7% of the records with the highest fees. That would mean if there were 1,000 records having YearMonth = 201501 and Level =A, then I am expecting to have 930 records with YearMonth = 201501 and Level =A while the remaining 70 are excluded from the new dataset. I have sorted the dataset by YearMonth, Level and Fees. I was wondering what be the best and most efficient approach to achiev this. Thanks.

 

YearMonth Level Percentile
201501 A 0.93
201501 B 0.95
201502 A 0.92
201503 A 0.92
201503 B 0.91
201503 C 0.94
5 REPLIES 5
Reeza
Super User

Do you have the percentiles already calculated?

If not, use PROC RANK with groups=100 and from the output select the desired records. I think it would be rank<92 because ranks go from 0 to 99. 

 

 

twildone
Pyrite | Level 9

Hi Reeza,

 

Yes the percentiles are already calculated and are specific for each unique YearMonth and Level.

Reeza
Super User

If the percentiles are already calculated isn't that just a WHERE or IF statement?

 

If not for some reason, detail your issue.

ballardw
Super User

Another approach would be to use Proc Surveyselect. You can use a data set with the percentiles as a control data set but the Percentile value must be in a variable named _RATE_.

The procedure call would look something like this:

proc survey select data=have out=want 
    samprate=Percentileset;
    by Yearmonth level;
run;

where Percentileset would be the name of the data set with YearMonth, Level and _rate.

 

Both your full dataset (fees?) and Percentileset would need to be sorted by Yearmonth and level.

This will give you a random selection of records.

twildone
Pyrite | Level 9

Hi Ballardw.....unfortunately I need to select the lowest X percentile of the records for each YearMonth and Level and not a random sample....unless I am missing something from your approach.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1451 views
  • 0 likes
  • 3 in conversation