I have a category variable with age groups, such as 1-10, 11-20, 21-20, 31-, and want to graphically present the change in percent frequency of each class between two dates. Does anyone have an idea on how I can do this within the Report Designer in SAS visual analytics? The desired reslut (which should be presented grapically somehow) is something like this.
The first three columns are percent frequency by data and the last two are growth in percent frequncey.
Age group | 2016-01-01 | 2016-06-01 | 2017-01-01 | 2016-01-01 - 2016-06-01 | 2016-01-01 - 2017-01-01 |
0-10 | 30% | 20% | 20% | -33% | -33% |
11-20 | 20% | 30% | 20% | 50% | 0% |
21-30 | 20% | 30% | 40% | 50% | 100% |
31- | 30% | 20% | 20% | -33% | -33% |
Tot | 100% | 100% | 100% |
which shold be calculated from data something like this:
Date | ID | Age group |
2016-01-01 | 1 | 0-10 |
2016-01-01 | 2 | 0-10 |
2016-01-01 | 3 | 0-10 |
2016-01-01 | 4 | 11-20 |
2016-01-01 | 5 | 11-20 |
2016-01-01 | 6 | 21-30 |
2016-01-01 | 7 | 21-30 |
2016-01-01 | 8 | 31- |
2016-01-01 | 9 | 31- |
2016-01-01 | 10 | 31- |
2016-06-01 | 1 | 0-10 |
2016-06-01 | 2 | 0-10 |
2016-06-01 | 3 | 11-20 |
2016-06-01 | 4 | 11-20 |
2016-06-01 | 5 | 11-20 |
2016-06-01 | 6 | 21-30 |
2016-06-01 | 7 | 21-30 |
2016-06-01 | 8 | 21-30 |
2016-06-01 | 9 | 31- |
2016-06-01 | 10 | 31- |
2017-01-01 | 1 | 0-10 |
2017-01-01 | 2 | 0-10 |
2017-01-01 | 3 | 11-20 |
2017-01-01 | 4 | 11-20 |
2017-01-01 | 5 | 21-30 |
2017-01-01 | 6 | 21-30 |
2017-01-01 | 7 | 21-30 |
2017-01-01 | 8 | 21-30 |
2017-01-01 | 9 | 31- |
2017-01-01 | 10 | 31- |
Hey Kroken,
Her is an alternative approach for your consideration:
I fully understand the benifit of loading data in Memory and what have you, but sometimes, working with purpose built data, can yield lots more benifits without demanding tones of resources.
Build your base percent frequency aggregated table first, then loaded into the LASR server, from that point, your calculated columns should be more straight forward.
Here is sample code to build the Base percent frequency Aggregated table:
DATA have;
LENGTH Date 8
ID 4
Age_group $5;
INPUT Date :yymmdd10. ID Age_group;
FORMAT date yymmdd10.;
DATALINES;
2016-01-01 1 0-10
2016-01-01 2 0-10
2016-01-01 3 0-10
2016-01-01 4 11-20
2016-01-01 5 11-20
2016-01-01 6 21-30
2016-01-01 7 21-30
2016-01-01 8 31-
2016-01-01 9 31-
2016-01-01 10 31-
2016-06-01 1 0-10
2016-06-01 2 0-10
2016-06-01 3 11-20
2016-06-01 4 11-20
2016-06-01 5 11-20
2016-06-01 6 21-30
2016-06-01 7 21-30
2016-06-01 8 21-30
2016-06-01 9 31-
2016-06-01 10 31-
2017-01-01 1 0-10
2017-01-01 2 0-10
2017-01-01 3 11-20
2017-01-01 4 11-20
2017-01-01 5 21-30
2017-01-01 6 21-30
2017-01-01 7 21-30
2017-01-01 8 21-30
2017-01-01 9 31-
2017-01-01 10 31-
;
RUN;
PROC TABULATE DATA=WORK.HAVE OUT=WORK.summary;
CLASS Date / ORDER=UNFORMATTED MISSING;
CLASS Age_group / ORDER=UNFORMATTED MISSING;
TABLE
/* Row Dimension */
Age_group={LABEL=''},
/* Column Dimension */
Date={LABEL=''}*ColPctN={LABEL=''}*F=BEST4.
/*Table Options */
/ BOX={LABEL='Age Group'} ;
;
RUN;
PROC TRANSPOSE DATA=WORK.SUMMARY OUT=WORK.WANT(DROP=_NAME_);
BY Age_group;
ID Date;
IDLABEL Date;
VAR PctN_10;
RUN;
Hope this helps,
Ahmed
HI Ahmed -
Thanks for you reply. Are you an experienced user of VA? I am very new to the tool and I get the feeling that it lacks the flexibility that I was hoping for. It seems that I very often get stuck when I try to build a graphical solution (I am only working in report builder so far) that requires data to be processed over more than two or three dimensions (or "steps"). For this example my initial intention was to load the actual age variable into the LASR and then make custom catetories and calculate the % frequencey and then the development of each category. But it seems it has to be done outside of the VA (as you suggests). Other attempts to work with the devolpment of grouped data has also failed and I have been advised to create purpose build data outside VA.
It seems to me that I should change my mindset and start to work with smaller datasets with purpose build data instead of trying to work with a large datamart containing raw data. For example to calculate the growth per age group outside VA and add a maximum of one or mabye two additional dimensions (as for example date). And use different source data tables for my reports. It kind of removes the benifits I thought I would find in this tool but it seems it is the best way to avoid getting stuck and frustrated.
Do you think that this is a better approach whenever I want to present something that requires processing raw data more than two or three times within the report builder?
Kind regards,
Kroken
Hi Kroken,
Here is my personal opinion, and I'm by no means advanced user of this tools, but I wish I was.
I can not disagree with you with regards to, the lack of functionalities, but then again, which tool/software doesn't suffer from this? and that's why, R&D and software developers keep comping up with updates and enhancements in every new release. That's just the nature of software applications. This is by no means a justification, it's a fact, and that's why you see customers favor one Vendor over another, because of what the tools can offer out of the box.
When it comes to Report Design & Development, All BI tools vendors will agree on, having pre-aggregated data saves resources and speeds delivery of results and visuals to the end user. Their tools might have the ability to calculate these custom aggregates on the fly within the report, via custom calculations and what have you, but when start overloading/repeating these custom calculations over multiple reports and data sources, the performance starts taking a hit, and users start to notice delayes in response times and that leads to frustrations, despite all the Query Caching, and Parallel query capability thay may have put inplace!
When you watch or listen to their experts taking about techiques to enhance query/report performance, they all will advice on pushing the processing down to the database or utilize pre-aggregated data to minimize run-time processing.
The other thing to consider, most / if not all of these BI Tools, provide a data preparation tool of some sort for that purpose.
As a report designer, you would probably start with a scetch of the report on a peice of paper, then you start putting/building the visual objects together and assembling the final report. During that process, you would discover
- Report Objective / Focus Area
- Exact data requirements (details, aggregates, ratios, differences, ....etc)
- Visual Objects (native, custom built)
- Annotations
- Links
All I can say, don't get discouraged, no report happens in just few seconds, even the one you see in demos, they are all rehersed 😉
Check this blog for reference: How to design an infographic about volcanic eruptions using SAS Visual Analytics
For additional related blogs, Just go to http://blogs.sas.com/content/ and serch for "Visual Analytics"
Hope this helps,
Ahmed
Thank you for you tips, time and insight Ahmed! Much appreciated. I'm going small!
Kind regards,
Kroken
Hi Kroken,
I came across this paper and I thought it might be of interest to you.
Designing for Performance – Best Practices for SAS® Visual Analytics Reports
Regards,
Ahmed
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.
Find more tutorials on the SAS Users YouTube channel.