Hi,
I am stuck on a problem where I need to find how many accounts make up the top 25% of sales by region.
Right now I have the data in this format:
Region Account % of Regional Sales Running total of % of Regional Sales Count
New York "Name" 12.5 12.5 1
New York "Name2" 10 22.5 2
...
There are multiple different regions.
What I need to do is figure out which observation pushes the running total of % above that threshold of 25% and then output the count variable for that observation. IE I am trying to find out how many accounts it took to get 25% of the sales for the New York region. Then the same for each subsequent region.
Thanks!
Hi @alexgouv
If your dataset is sorted on Region + SalesPct descending, and count is in ascending order within the region, then the following code will do the work;
data have;
Region='New York'; Account='Name1'; Salespct=12.5; RunningTotal=12.5; count=1; output;
Region='New York'; Account='Name2'; Salespct=10; RunningTotal=22.5; count=2; output;
Region='New York'; Account='Name3'; Salespct=20; RunningTotal=42.5; count=3; output;
Region='New York'; Account='Name4'; Salespct=17.5; RunningTotal=60; count=4; output;
Region='Omaha'; Account='Name2'; Salespct=7.5; RunningTotal=7.5; count=1; output;
Region='Omaha'; Account='Name2'; Salespct=20.5; RunningTotal=27.5; count=2; output;
Region='Omaha'; Account='Name2'; Salespct=12.5; RunningTotal=40; count=2; output;
run;
data want (drop=Salespct RunningTotal) ; set have; by region;
if RunningTotal >= 25 and lag(RunningTotal) < 25 then output;
run;
- and next time please supply code to create an input data set. Often the better part of the time used to answer a question is taken up by preparing some data to use for testing.
Hi @alexgouv
If your dataset is sorted on Region + SalesPct descending, and count is in ascending order within the region, then the following code will do the work;
data have;
Region='New York'; Account='Name1'; Salespct=12.5; RunningTotal=12.5; count=1; output;
Region='New York'; Account='Name2'; Salespct=10; RunningTotal=22.5; count=2; output;
Region='New York'; Account='Name3'; Salespct=20; RunningTotal=42.5; count=3; output;
Region='New York'; Account='Name4'; Salespct=17.5; RunningTotal=60; count=4; output;
Region='Omaha'; Account='Name2'; Salespct=7.5; RunningTotal=7.5; count=1; output;
Region='Omaha'; Account='Name2'; Salespct=20.5; RunningTotal=27.5; count=2; output;
Region='Omaha'; Account='Name2'; Salespct=12.5; RunningTotal=40; count=2; output;
run;
data want (drop=Salespct RunningTotal) ; set have; by region;
if RunningTotal >= 25 and lag(RunningTotal) < 25 then output;
run;
- and next time please supply code to create an input data set. Often the better part of the time used to answer a question is taken up by preparing some data to use for testing.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.