BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
alexgouv
Obsidian | Level 7

Hi,

 

I am trying to create a flag for the top N values in my dataset by region.

 

Right now I have the total amount of sales for each account and I need to find the number of accounts that make up the top 25% and top 50% of sales for the region they are in. The output I am looking for would be the dataset with a flag on the end that says whether or no this account is part of the top 25% of sales, top 50% of sales or neither.

 

When I say this I do not mean that this account accounts for 25% of the sales itself, rather this account and the subsequent accounts make up 25%.

 

I tried sorting by region and sales which gives me a list that looks like this:

Region1 Account1 100

Region1 Account2 75

Region1 Account3 12

...

I then found the % of sales each account is responsible for in their region. Now I am stuck on aggregating the percents from top to bottom

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class region;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by region descending sales;
run;
data have2;
    merge have total_sales;
    by region;
    if first.region then cumulative_percent=0;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;
--
Paige Miller

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

I'm trying to think my way through this idea of "Top N by Sum" and its not really clear to me exactly how this would work.

 

I suppose you want to sort by descending sales, as you have shown, with the largest sales account first, but then what happens if the top account sales are >25%? What happens if the top account sales is greater than 50%? What happens if an account's sales moves the sum from 48% to 54%, what group does it fit in?

 

Can you give a specific example, showing inputs and the desired outputs?

--
Paige Miller
alexgouv
Obsidian | Level 7

Hi basically whatever account puts it over the top would be classified in that group. 

 

So if account 1 is 76% of the sales this would be the only account in top 25%, top 50% and top 75%. 

If the top account is 24% and the next one is 3% they would both be considered to the top 25%.

PaigeMiller
Diamond | Level 26

The first step would be to get the cumulative percents ... you would sort by descending sales, divide by total sales to get non-cumulative percent, then get the cumulative percents and do a grouping.

 

Something like this UNTESTED CODE:

 

proc summary data=have;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by descending sales;
run;
data have2;
    if _n_=1 then set total_sales;
    set have;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;

So, if start_of_grouping is less than 25, it is in the <25 group. If start_of_grouping is less than 50, it is in the <50 group. And so on.

--
Paige Miller
alexgouv
Obsidian | Level 7

Only issue is that its not doing it by region. This is the same issue I was having

PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class region;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by region descending sales;
run;
data have2;
    merge have total_sales;
    by region;
    if first.region then cumulative_percent=0;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;
--
Paige Miller

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1375 views
  • 1 like
  • 2 in conversation