BookmarkSubscribeRSS Feed
_SAS_
Obsidian | Level 7

Hi all,

 

I have the following situation.

 

I have some huge underlying data set - imagine it like orders.

 

RowID          Product          Subproduct          Counterparty          Amount

1                   A                    M                          X                            10

2                   A                    M                          Y                             5

3                   B                    M                          Z                             7

.... and so on.

 

I need to generate a report something like this:

 

Category          Amount

1                      amount1 (is a sum of category 1.1 and 1.2)

1.1                   amount1_1 (is a sum of category 1.1.1 and 1.1.2)

1.1.1                amount1_1_1

1.1.2                amount1_1_2

1.2                   amount1_2 (equals amount from underlying category 1.2.1)

1.2.1                amount1_2_1

 

 

I read in the specs of the report from an XLS where I specify the where conditions for the lowest level conditions (meaning the Categories 1.1.1 and 1.1.2) and for the higher level conditions the category IDs of the sub-sums.

For example, for category 1.1.1. i read in "Product in (A,B) and Counterparty not in (Y,Z) and for category 1.1 i read in that is a sum of 1.1.1 and 1.1.2. 

 

The XLS is something like this:

 

Category             Type             Content

Category 1          Sum             Category 1.1, Category 1.2

Category 1.1       Sum             Category 1.1.1, Category 1.1.2

Category 1.1.1    Formula       Product = A and Subproduct = X

Category 1.1.2    Formula       Product in (A,B) and Counterparty in (X,Z)

Category 1.2       Sum             Category 1.2.1

Category 1.2.1    Forumula      Product = B

.... and so on

 

I use a macro to go through the underlying data set and assign on a separate column the lowest level category where in the specs table the type is Formula (e.g 1.1.1 and 1.1.2). In case a row is assigned to more categories, I duplicate it and output it with each subcategory.

 

The output of the macro would be in a data set like this

 

RowID          Category

1                  1.1.1

1                  1.1.2

2                  No category

3                  1.1.2

3                  1.2.1

 

The report i need to produce should be at the end 

 

Category                Amount

Category 1             34

Category 1.1          27

Category 1.1.1       10

Category 1.1.2       17

Category 1.2          7

Category 1.2.1       7

 

Now my question - what would be the most efficient way of processing all the sums-ups and creatng the report I need considering that the underlying data set is quite huge and the report specification (the number of categories) is also quite big. I don't want to do this hardcoded, I want to be driven by the XLS read-in.

 

The most stupid way I could think about is to update the macro so that whenever I find a subcategory and I output the row (e.g. 1.1.1), I also read in which upper categories the row is included (in this particular case 1.1) and ooutput the row again, then repeat and output the row also for 1. This will surely increase the volume of data but then would be easier to just join these two data sets and sum the amounts group by category ID. 

 

Do you have any other ideas?

 

Thanks 🙂

4 REPLIES 4
Shmuel
Garnet | Level 18

SAS is very fast while dealing with SAS tables;

 

You can import the excel file into SAS;

Add your categories and subcategories as new variables accroding to the definitions

and finally create the report using PROC TABULATE;

 

If you have dificulties with the coding the forum will help you.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Size of data is pretty irrelevant, it will just take longer to run if there is more data - this shouldn't be a problem.  Why do you have logic in an Excel file?  That to me sounds like the biggest problem here.  Why not simply transfer the logic from that file into a programmed SAS format?  As for your logic, I am afraid I don't see how that matches your data, there is no "category" variable given, so what is 1 or 1.1 etc.

As for the question, the most efficient method of processing data is to use SAS procedures, be that means, freq, or any of the other thousands of stats procedures given.  Structure your data in a good way, then use by groups and such like to arrive at the results you need.  I would suggest firstly, dropping all the macro, and the Excel file, and working closely with Base SAS (which is the programming language) to get a decent program working first before you try to stretch it to include other items.

Patrick
Opal | Level 21

I would hope that each row in your underlying data set is assigned to exactly one lowest level category (so it can't be in two). 

 

If so then I believe what you're in the end dealing with is a fact table (that's your "underlying data" and a dimension table - your categories). So what you need to build up now are the hierarchies.

 

The ugly thing is that you're just getting the logic for creating the dimension and that this is in Excel. You can try to process that programatically - but if the Excel is not very clean then I fear this will become a manual excercise. 

I assume someone had already a similar report. If you can then try and get hold of the actual dimensional data as well.

 

What I would be doing once you've got the dimensions: Take your fact table and add as many columns as you've got hierarchies (level1, level2, level3,....).

Level1 would be the top level containing "1,2,3", level2 would be "1.1, 1.2, 2.1,..."

Once you're there you can use Proc Tabulate, Proc Report or Proc Means have them do the job for you.

ballardw
Super User

Provide an example input data set and the desired result.

This

Category                Amount

Category 1             34

Category 1.1          27

Category 1.1.1       10

Category 1.1.2       17

Category 1.2          7

Category 1.2.1       7

without input basically tells us next to nothing and the example:

RowID          Product          Subproduct          Counterparty          Amount

1                   A                    M                          X                            10

2                   A                    M                          Y                             5

3                   B                    M                          Z                             7

has nothing related to your "category" listings. Categories shoud be defined in terms of variables and values in the actual data.

 

The instructions at https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn your input data set (or an example small enough to demostrate your issue) into data step code that can be pasted to the forum or attached as a TXT file.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 979 views
  • 0 likes
  • 5 in conversation