Hello,
I have one column in a dataset with 5M Rows of data. I'm trying to understand the range of values and figure out the general distribution. How would I select a percentage of the highest amounts and a percentage of the lowest amounts? Thanks!
I would recommend a histogram first - using PROC UNIVARIATE.
It also displays the highest.
Then I would also recommend PROC RANK.
Rank the variable of interest using groups of 100, then you can find all less than X% by choosing all less than the Xth rank. Note how it handles tied values though - and that's one reason I prefer this methodology. It can account for ties where some of the manual methodologies will not, by default, so you need extra coding.
Also, use option NEXTROBS=n in proc univariate to display the n highest and n lowest observations.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.