DATA Step, Macro, Functions and more

Find the TOP X % and the Bottom X %

Reply
Occasional Contributor
Posts: 8

Find the TOP X % and the Bottom X %

[ Edited ]

Hello,

   I have one column in a dataset with 5M Rows of data. I'm trying to understand the range of values and figure out the general distribution.  How would I select a percentage of the highest amounts and a percentage of the lowest amounts? Thanks! 

Super User
Posts: 24,004

Re: Find the TOP X % and the Bottom X %

Posted in reply to JayCompany

I would recommend a histogram first - using PROC UNIVARIATE. 

It also displays the highest. 

 

Then I would also recommend PROC RANK. 

Rank the variable of interest using groups of 100, then you can find all less than X% by choosing all less than the Xth rank. Note how it handles tied values though - and that's one reason I prefer this methodology. It can account for ties where some of the manual methodologies will not, by default, so you need extra coding.

Esteemed Advisor
Posts: 5,624

Re: Find the TOP X % and the Bottom X %

Posted in reply to JayCompany

Also, use option NEXTROBS=n in proc univariate to display the n highest and n lowest observations.

PG
Ask a Question
Discussion stats
  • 2 replies
  • 111 views
  • 0 likes
  • 3 in conversation