Solved: Re: Top N by sum

alexgouv · Posted 06-27-2019 03:39 PM

Hi,

I am trying to create a flag for the top N values in my dataset by region.

Right now I have the total amount of sales for each account and I need to find the number of accounts that make up the top 25% and top 50% of sales for the region they are in. The output I am looking for would be the dataset with a flag on the end that says whether or no this account is part of the top 25% of sales, top 50% of sales or neither.

When I say this I do not mean that this account accounts for 25% of the sales itself, rather this account and the subsequent accounts make up 25%.

I tried sorting by region and sales which gives me a list that looks like this:

Region1 Account1 100

Region1 Account2 75

Region1 Account3 12

...

I then found the % of sales each account is responsible for in their region. Now I am stuck on aggregating the percents from top to bottom

PaigeMiller · Posted 06-27-2019 04:21 PM

proc summary data=have nway;
    class region;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by region descending sales;
run;
data have2;
    merge have total_sales;
    by region;
    if first.region then cumulative_percent=0;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 06-27-2019 03:44 PM

I'm trying to think my way through this idea of "Top N by Sum" and its not really clear to me exactly how this would work.

I suppose you want to sort by descending sales, as you have shown, with the largest sales account first, but then what happens if the top account sales are >25%? What happens if the top account sales is greater than 50%? What happens if an account's sales moves the sum from 48% to 54%, what group does it fit in?

Can you give a specific example, showing inputs and the desired outputs?

--
Paige Miller

alexgouv · Posted 06-27-2019 03:47 PM

Hi basically whatever account puts it over the top would be classified in that group.

So if account 1 is 76% of the sales this would be the only account in top 25%, top 50% and top 75%.

If the top account is 24% and the next one is 3% they would both be considered to the top 25%.

PaigeMiller · Posted 06-27-2019 04:00 PM

The first step would be to get the cumulative percents ... you would sort by descending sales, divide by total sales to get non-cumulative percent, then get the cumulative percents and do a grouping.

Something like this UNTESTED CODE:

proc summary data=have;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by descending sales;
run;
data have2;
    if _n_=1 then set total_sales;
    set have;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;

So, if start_of_grouping is less than 25, it is in the <25 group. If start_of_grouping is less than 50, it is in the <50 group. And so on.

--
Paige Miller

alexgouv · Posted 06-27-2019 04:19 PM

Only issue is that its not doing it by region. This is the same issue I was having

PaigeMiller · Posted 06-27-2019 04:21 PM

proc summary data=have nway;
    class region;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by region descending sales;
run;
data have2;
    merge have total_sales;
    by region;
    if first.region then cumulative_percent=0;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;

--
Paige Miller

Top N by sum

Re: Top N by sum

Re: Top N by sum

Re: Top N by sum

Re: Top N by sum

Re: Top N by sum

Re: Top N by sum

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away