Hi,
I am trying to create a flag for the top N values in my dataset by region.
Right now I have the total amount of sales for each account and I need to find the number of accounts that make up the top 25% and top 50% of sales for the region they are in. The output I am looking for would be the dataset with a flag on the end that says whether or no this account is part of the top 25% of sales, top 50% of sales or neither.
When I say this I do not mean that this account accounts for 25% of the sales itself, rather this account and the subsequent accounts make up 25%.
I tried sorting by region and sales which gives me a list that looks like this:
Region1 Account1 100
Region1 Account2 75
Region1 Account3 12
...
I then found the % of sales each account is responsible for in their region. Now I am stuck on aggregating the percents from top to bottom
proc summary data=have nway;
class region;
var sales;
output out=total_sales sum=total_sales;
run;
proc sort data=have;
by region descending sales;
run;
data have2;
merge have total_sales;
by region;
if first.region then cumulative_percent=0;
percent_of_sales = sales/total_sales;
cumulative_percent + percent_of_sales;
start_of_grouping = cumulative_percent - percent_of_sales;
run;
I'm trying to think my way through this idea of "Top N by Sum" and its not really clear to me exactly how this would work.
I suppose you want to sort by descending sales, as you have shown, with the largest sales account first, but then what happens if the top account sales are >25%? What happens if the top account sales is greater than 50%? What happens if an account's sales moves the sum from 48% to 54%, what group does it fit in?
Can you give a specific example, showing inputs and the desired outputs?
Hi basically whatever account puts it over the top would be classified in that group.
So if account 1 is 76% of the sales this would be the only account in top 25%, top 50% and top 75%.
If the top account is 24% and the next one is 3% they would both be considered to the top 25%.
The first step would be to get the cumulative percents ... you would sort by descending sales, divide by total sales to get non-cumulative percent, then get the cumulative percents and do a grouping.
Something like this UNTESTED CODE:
proc summary data=have;
var sales;
output out=total_sales sum=total_sales;
run;
proc sort data=have;
by descending sales;
run;
data have2;
if _n_=1 then set total_sales;
set have;
percent_of_sales = sales/total_sales;
cumulative_percent + percent_of_sales;
start_of_grouping = cumulative_percent - percent_of_sales;
run;
So, if start_of_grouping is less than 25, it is in the <25 group. If start_of_grouping is less than 50, it is in the <50 group. And so on.
Only issue is that its not doing it by region. This is the same issue I was having
proc summary data=have nway;
class region;
var sales;
output out=total_sales sum=total_sales;
run;
proc sort data=have;
by region descending sales;
run;
data have2;
merge have total_sales;
by region;
if first.region then cumulative_percent=0;
percent_of_sales = sales/total_sales;
cumulative_percent + percent_of_sales;
start_of_grouping = cumulative_percent - percent_of_sales;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.