BookmarkSubscribeRSS Feed
twildone
Pyrite | Level 9

Hi....I have a dataset that I am  trying to select a certain percentile of records based on their Fees for each YearMonth and within each Level. For example, from the table below, for the YearMonth = 201501 and Level=A, I would like to select and keep 93% of the records with the lowest fees and exclude the remaining 7% of the records with the highest fees. That would mean if there were 1,000 records having YearMonth = 201501 and Level =A, then I am expecting to have 930 records with YearMonth = 201501 and Level =A while the remaining 70 are excluded from the new dataset. I have sorted the dataset by YearMonth, Level and Fees. I was wondering what be the best and most efficient approach to achiev this. Thanks.

 

YearMonth Level Percentile
201501 A 0.93
201501 B 0.95
201502 A 0.92
201503 A 0.92
201503 B 0.91
201503 C 0.94
5 REPLIES 5
Reeza
Super User

Do you have the percentiles already calculated?

If not, use PROC RANK with groups=100 and from the output select the desired records. I think it would be rank<92 because ranks go from 0 to 99. 

 

 

twildone
Pyrite | Level 9

Hi Reeza,

 

Yes the percentiles are already calculated and are specific for each unique YearMonth and Level.

Reeza
Super User

If the percentiles are already calculated isn't that just a WHERE or IF statement?

 

If not for some reason, detail your issue.

ballardw
Super User

Another approach would be to use Proc Surveyselect. You can use a data set with the percentiles as a control data set but the Percentile value must be in a variable named _RATE_.

The procedure call would look something like this:

proc survey select data=have out=want 
    samprate=Percentileset;
    by Yearmonth level;
run;

where Percentileset would be the name of the data set with YearMonth, Level and _rate.

 

Both your full dataset (fees?) and Percentileset would need to be sorted by Yearmonth and level.

This will give you a random selection of records.

twildone
Pyrite | Level 9

Hi Ballardw.....unfortunately I need to select the lowest X percentile of the records for each YearMonth and Level and not a random sample....unless I am missing something from your approach.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 723 views
  • 0 likes
  • 3 in conversation