Hello,
I have one column in a dataset with 5M Rows of data. I'm trying to understand the range of values and figure out the general distribution. How would I select a percentage of the highest amounts and a percentage of the lowest amounts? Thanks!
I would recommend a histogram first - using PROC UNIVARIATE.
It also displays the highest.
Then I would also recommend PROC RANK.
Rank the variable of interest using groups of 100, then you can find all less than X% by choosing all less than the Xth rank. Note how it handles tied values though - and that's one reason I prefer this methodology. It can account for ties where some of the manual methodologies will not, by default, so you need extra coding.
Also, use option NEXTROBS=n in proc univariate to display the n highest and n lowest observations.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.