Hi,
I am trying to create a flag for the top N values in my dataset by region.
Right now I have the total amount of sales for each account and I need to find the number of accounts that make up the top 25% and top 50% of sales for the region they are in. The output I am looking for would be the dataset with a flag on the end that says whether or no this account is part of the top 25% of sales, top 50% of sales or neither.
When I say this I do not mean that this account accounts for 25% of the sales itself, rather this account and the subsequent accounts make up 25%.
I tried sorting by region and sales which gives me a list that looks like this:
Region1 Account1 100
Region1 Account2 75
Region1 Account3 12
...
I then found the % of sales each account is responsible for in their region. Now I am stuck on aggregating the percents from top to bottom
proc summary data=have nway;
class region;
var sales;
output out=total_sales sum=total_sales;
run;
proc sort data=have;
by region descending sales;
run;
data have2;
merge have total_sales;
by region;
if first.region then cumulative_percent=0;
percent_of_sales = sales/total_sales;
cumulative_percent + percent_of_sales;
start_of_grouping = cumulative_percent - percent_of_sales;
run;
I'm trying to think my way through this idea of "Top N by Sum" and its not really clear to me exactly how this would work.
I suppose you want to sort by descending sales, as you have shown, with the largest sales account first, but then what happens if the top account sales are >25%? What happens if the top account sales is greater than 50%? What happens if an account's sales moves the sum from 48% to 54%, what group does it fit in?
Can you give a specific example, showing inputs and the desired outputs?
Hi basically whatever account puts it over the top would be classified in that group.
So if account 1 is 76% of the sales this would be the only account in top 25%, top 50% and top 75%.
If the top account is 24% and the next one is 3% they would both be considered to the top 25%.
The first step would be to get the cumulative percents ... you would sort by descending sales, divide by total sales to get non-cumulative percent, then get the cumulative percents and do a grouping.
Something like this UNTESTED CODE:
proc summary data=have;
var sales;
output out=total_sales sum=total_sales;
run;
proc sort data=have;
by descending sales;
run;
data have2;
if _n_=1 then set total_sales;
set have;
percent_of_sales = sales/total_sales;
cumulative_percent + percent_of_sales;
start_of_grouping = cumulative_percent - percent_of_sales;
run;
So, if start_of_grouping is less than 25, it is in the <25 group. If start_of_grouping is less than 50, it is in the <50 group. And so on.
Only issue is that its not doing it by region. This is the same issue I was having
proc summary data=have nway;
class region;
var sales;
output out=total_sales sum=total_sales;
run;
proc sort data=have;
by region descending sales;
run;
data have2;
merge have total_sales;
by region;
if first.region then cumulative_percent=0;
percent_of_sales = sales/total_sales;
cumulative_percent + percent_of_sales;
start_of_grouping = cumulative_percent - percent_of_sales;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.