Hi....I have a dataset that I am trying to select a certain percentile of records based on their Fees for each YearMonth and within each Level. For example, from the table below, for the YearMonth = 201501 and Level=A, I would like to select and keep 93% of the records with the lowest fees and exclude the remaining 7% of the records with the highest fees. That would mean if there were 1,000 records having YearMonth = 201501 and Level =A, then I am expecting to have 930 records with YearMonth = 201501 and Level =A while the remaining 70 are excluded from the new dataset. I have sorted the dataset by YearMonth, Level and Fees. I was wondering what be the best and most efficient approach to achiev this. Thanks.
YearMonth | Level | Percentile |
201501 | A | 0.93 |
201501 | B | 0.95 |
201502 | A | 0.92 |
201503 | A | 0.92 |
201503 | B | 0.91 |
201503 | C | 0.94 |
Do you have the percentiles already calculated?
If not, use PROC RANK with groups=100 and from the output select the desired records. I think it would be rank<92 because ranks go from 0 to 99.
Hi Reeza,
Yes the percentiles are already calculated and are specific for each unique YearMonth and Level.
If the percentiles are already calculated isn't that just a WHERE or IF statement?
If not for some reason, detail your issue.
Another approach would be to use Proc Surveyselect. You can use a data set with the percentiles as a control data set but the Percentile value must be in a variable named _RATE_.
The procedure call would look something like this:
proc survey select data=have out=want samprate=Percentileset; by Yearmonth level; run;
where Percentileset would be the name of the data set with YearMonth, Level and _rate.
Both your full dataset (fees?) and Percentileset would need to be sorted by Yearmonth and level.
This will give you a random selection of records.
Hi Ballardw.....unfortunately I need to select the lowest X percentile of the records for each YearMonth and Level and not a random sample....unless I am missing something from your approach.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.