BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
alexgouv
Obsidian | Level 7

Hi,

 

I am trying to create a flag for the top N values in my dataset by region.

 

Right now I have the total amount of sales for each account and I need to find the number of accounts that make up the top 25% and top 50% of sales for the region they are in. The output I am looking for would be the dataset with a flag on the end that says whether or no this account is part of the top 25% of sales, top 50% of sales or neither.

 

When I say this I do not mean that this account accounts for 25% of the sales itself, rather this account and the subsequent accounts make up 25%.

 

I tried sorting by region and sales which gives me a list that looks like this:

Region1 Account1 100

Region1 Account2 75

Region1 Account3 12

...

I then found the % of sales each account is responsible for in their region. Now I am stuck on aggregating the percents from top to bottom

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class region;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by region descending sales;
run;
data have2;
    merge have total_sales;
    by region;
    if first.region then cumulative_percent=0;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;
--
Paige Miller

View solution in original post

5 REPLIES 5
PaigeMiller
Diamond | Level 26

I'm trying to think my way through this idea of "Top N by Sum" and its not really clear to me exactly how this would work.

 

I suppose you want to sort by descending sales, as you have shown, with the largest sales account first, but then what happens if the top account sales are >25%? What happens if the top account sales is greater than 50%? What happens if an account's sales moves the sum from 48% to 54%, what group does it fit in?

 

Can you give a specific example, showing inputs and the desired outputs?

--
Paige Miller
alexgouv
Obsidian | Level 7

Hi basically whatever account puts it over the top would be classified in that group. 

 

So if account 1 is 76% of the sales this would be the only account in top 25%, top 50% and top 75%. 

If the top account is 24% and the next one is 3% they would both be considered to the top 25%.

PaigeMiller
Diamond | Level 26

The first step would be to get the cumulative percents ... you would sort by descending sales, divide by total sales to get non-cumulative percent, then get the cumulative percents and do a grouping.

 

Something like this UNTESTED CODE:

 

proc summary data=have;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by descending sales;
run;
data have2;
    if _n_=1 then set total_sales;
    set have;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;

So, if start_of_grouping is less than 25, it is in the <25 group. If start_of_grouping is less than 50, it is in the <50 group. And so on.

--
Paige Miller
alexgouv
Obsidian | Level 7

Only issue is that its not doing it by region. This is the same issue I was having

PaigeMiller
Diamond | Level 26
proc summary data=have nway;
    class region;
    var sales;
    output out=total_sales sum=total_sales;
run;
proc sort data=have;
    by region descending sales;
run;
data have2;
    merge have total_sales;
    by region;
    if first.region then cumulative_percent=0;
    percent_of_sales = sales/total_sales;
    cumulative_percent + percent_of_sales;
    start_of_grouping = cumulative_percent - percent_of_sales;
run;
--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1217 views
  • 1 like
  • 2 in conversation