BookmarkSubscribeRSS Feed
JayCompany
Calcite | Level 5

Hello,

   I have one column in a dataset with 5M Rows of data. I'm trying to understand the range of values and figure out the general distribution.  How would I select a percentage of the highest amounts and a percentage of the lowest amounts? Thanks! 

2 REPLIES 2
Reeza
Super User

I would recommend a histogram first - using PROC UNIVARIATE. 

It also displays the highest. 

 

Then I would also recommend PROC RANK. 

Rank the variable of interest using groups of 100, then you can find all less than X% by choosing all less than the Xth rank. Note how it handles tied values though - and that's one reason I prefer this methodology. It can account for ties where some of the manual methodologies will not, by default, so you need extra coding.

PGStats
Opal | Level 21

Also, use option NEXTROBS=n in proc univariate to display the n highest and n lowest observations.

PG

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 979 views
  • 0 likes
  • 3 in conversation