Hi,
I am stuck on a problem where I need to find how many accounts make up the top 25% of sales by region.
Right now I have the data in this format:
Region Account % of Regional Sales Running total of % of Regional Sales Count
New York "Name" 12.5 12.5 1
New York "Name2" 10 22.5 2
...
There are multiple different regions.
What I need to do is figure out which observation pushes the running total of % above that threshold of 25% and then output the count variable for that observation. IE I am trying to find out how many accounts it took to get 25% of the sales for the New York region. Then the same for each subsequent region.
Thanks!
Hi @alexgouv
If your dataset is sorted on Region + SalesPct descending, and count is in ascending order within the region, then the following code will do the work;
data have;
Region='New York'; Account='Name1'; Salespct=12.5; RunningTotal=12.5; count=1; output;
Region='New York'; Account='Name2'; Salespct=10; RunningTotal=22.5; count=2; output;
Region='New York'; Account='Name3'; Salespct=20; RunningTotal=42.5; count=3; output;
Region='New York'; Account='Name4'; Salespct=17.5; RunningTotal=60; count=4; output;
Region='Omaha'; Account='Name2'; Salespct=7.5; RunningTotal=7.5; count=1; output;
Region='Omaha'; Account='Name2'; Salespct=20.5; RunningTotal=27.5; count=2; output;
Region='Omaha'; Account='Name2'; Salespct=12.5; RunningTotal=40; count=2; output;
run;
data want (drop=Salespct RunningTotal) ; set have; by region;
if RunningTotal >= 25 and lag(RunningTotal) < 25 then output;
run;
- and next time please supply code to create an input data set. Often the better part of the time used to answer a question is taken up by preparing some data to use for testing.
Hi @alexgouv
If your dataset is sorted on Region + SalesPct descending, and count is in ascending order within the region, then the following code will do the work;
data have;
Region='New York'; Account='Name1'; Salespct=12.5; RunningTotal=12.5; count=1; output;
Region='New York'; Account='Name2'; Salespct=10; RunningTotal=22.5; count=2; output;
Region='New York'; Account='Name3'; Salespct=20; RunningTotal=42.5; count=3; output;
Region='New York'; Account='Name4'; Salespct=17.5; RunningTotal=60; count=4; output;
Region='Omaha'; Account='Name2'; Salespct=7.5; RunningTotal=7.5; count=1; output;
Region='Omaha'; Account='Name2'; Salespct=20.5; RunningTotal=27.5; count=2; output;
Region='Omaha'; Account='Name2'; Salespct=12.5; RunningTotal=40; count=2; output;
run;
data want (drop=Salespct RunningTotal) ; set have; by region;
if RunningTotal >= 25 and lag(RunningTotal) < 25 then output;
run;
- and next time please supply code to create an input data set. Often the better part of the time used to answer a question is taken up by preparing some data to use for testing.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.