Thanks in advance for the help. I have a dataset that looks like the below, with a large number of simulated scenario results by Group and Subjects.
I am trying to calculate a modified percentile by Group or Subject. By modified I mean:
- A percentile (e.g., 99.5th) by Group or Subject across all scenarios
- An average of the 50 observations around the specified (e.g., 99.5th) percentile to be used in place of the single datapoint corresponding to the percentile; or even like avg of the observations that fall within the 99.25th and 99.75th percentile
How can be achieved?
Scenarios | Groups | Subject | Results |
1 | A | Tom | 100 |
1 | A | Tim | 110 |
1 | A | Ted | 120 |
1 | B | Bob | 110 |
1 | B | Ben | 100 |
1 | B | Bill | 90 |
1 | … | … | … |
2 | A | Tom | 105 |
2 | A | Tim | 115 |
2 | A | Ted | 125 |
2 | B | Bob | 115 |
2 | B | Ben | 105 |
2 | B | Bill | 95 |
2 | … | … | … |
You may be looking for a Trimmed Mean. Proc univariate will do trimmed means using the TRIM option. But I think you need to consider what you mean by "within the 99.25th and 99.75th percentile". How many records do you have for any group or subject? If there are 100 then you do not have any in that range as they would go from 99th to 100th percentile.
Can you provide what you would expect for the output of the example data you provided?
The dataset is fairly large, with results for about 10,000 scenarios, each with 7 groups having 25,000 subjects in each group. So total row size would be about 10k x 7 x 25k.
I would be calculating the percentile (99.5th) statistics for:
By the modified percentile, the output I am expecting would be, using subject Tom as example:
If the results for Tom across all scenarios are sorted {1 [smallest], 2, 3, 4... 9925,... 9975... 10000th [largest])
My modified 99.5th percentile results for Tom would be the average of all observations between 9925 and 9975th observations.
Is there such a procedure within SAS that can calculate such modified percentile by group?
Read up on proc ranks. Aggregating, running proc ranks and then averaging by proc means should work. Thanks for the advise!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.