Hi....I have a dataset that I am trying to select a certain percentile of records based on their Fees for each YearMonth and within each Level. For example, from the table below, for the YearMonth = 201501 and Level=A, I would like to select and keep 93% of the records with the lowest fees and exclude the remaining 7% of the records with the highest fees. That would mean if there were 1,000 records having YearMonth = 201501 and Level =A, then I am expecting to have 930 records with YearMonth = 201501 and Level =A while the remaining 70 are excluded from the new dataset. I have sorted the dataset by YearMonth, Level and Fees. I was wondering what be the best and most efficient approach to achiev this. Thanks.
YearMonth | Level | Percentile |
201501 | A | 0.93 |
201501 | B | 0.95 |
201502 | A | 0.92 |
201503 | A | 0.92 |
201503 | B | 0.91 |
201503 | C | 0.94 |
Do you have the percentiles already calculated?
If not, use PROC RANK with groups=100 and from the output select the desired records. I think it would be rank<92 because ranks go from 0 to 99.
Hi Reeza,
Yes the percentiles are already calculated and are specific for each unique YearMonth and Level.
If the percentiles are already calculated isn't that just a WHERE or IF statement?
If not for some reason, detail your issue.
Another approach would be to use Proc Surveyselect. You can use a data set with the percentiles as a control data set but the Percentile value must be in a variable named _RATE_.
The procedure call would look something like this:
proc survey select data=have out=want samprate=Percentileset; by Yearmonth level; run;
where Percentileset would be the name of the data set with YearMonth, Level and _rate.
Both your full dataset (fees?) and Percentileset would need to be sorted by Yearmonth and level.
This will give you a random selection of records.
Hi Ballardw.....unfortunately I need to select the lowest X percentile of the records for each YearMonth and Level and not a random sample....unless I am missing something from your approach.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.