Hello,
I have one column in a dataset with 5M Rows of data. I'm trying to understand the range of values and figure out the general distribution. How would I select a percentage of the highest amounts and a percentage of the lowest amounts? Thanks!
I would recommend a histogram first - using PROC UNIVARIATE.
It also displays the highest.
Then I would also recommend PROC RANK.
Rank the variable of interest using groups of 100, then you can find all less than X% by choosing all less than the Xth rank. Note how it handles tied values though - and that's one reason I prefer this methodology. It can account for ties where some of the manual methodologies will not, by default, so you need extra coding.
Also, use option NEXTROBS=n in proc univariate to display the n highest and n lowest observations.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.