BookmarkSubscribeRSS Feed

Making VA reports more dynamic: category, measure, and group variables can be selected by the user

Started ‎06-08-2017 by
Modified ‎06-08-2017 by
Views 5,093

 

In SAS VA, once a report is created, users cannot change what variables are displayed on graphs[see comment in bold at the bottom of this article], what formats are used, what measures are displayed or how stacked bars are split. It is all static and the only change allowed is to subset the data, which can be a shock for users accustomed to slicing and dicing in a spreadsheet’s pivot table.

 

Thankfully it doesn’t have to be like this and things can be improved with a little prior data preparation.

 

In the screenshot below, which reports on counts and percentages, users can choose to have weekly or monthly aggregations (with matching date formatting) and they can split the histogram bars by a subpopulation of their choice (region or type or no split). As expected the subpopulations can also be used to filter data in a multi-choice manner. Regardless of the choices made, the correct counts and the correct percentages with respect to the total population considered are displayed. In this report, the percentages are calculated on the fly by VA.

 

va1.png

 

To obtain this flexibility, the data has to be pre-summarised for all crossings (nway), and then summarised again for each subpopulation split. This means that the size of the data set can increase considerably compared to the original detail data.

This procedure defeats the main purpose of VA’s, which is to load detail data in memory, and to derive aggregations on the fly as the user navigates the report. However, until VA can do simple frequency reports like this one without pre-aggregation, this method is very useful to provide flexibility to users without having to create numerous static reports, so I thought I’d share the method here.

 

The preparation code below does the following:
- Generate sample data
- Summarise for all crossings
- Summarise again for each subpopulation (two subpopulations here)
- Append all the data together.

- Shrink the table width (not shown here) 

 

I will not comment each step of the code. It is not very long and the steps are few and simple, so just run it and see what each step does.

 

Once the data is loaded in VA, the last thing left to do is to calculate the percentages. This is done by creating the aggregated measure:

 

Pct_freq  = Sum [_ByGroup_] ('Count'n) / Sum [_ByGroup_] ('TOTAL'n)

 

The method described here, where various ways to slice the data are appended into a table and then filtered in VA, can be extended to add any form of navigation to your reports. For example one of our reports allows users to see either counts or percentages on the same bar chart. Now if users could also change the chart type…

 

You’ll also note that the filters were added in the main reporting area because the section prompt area can only accept a very limited number of objects such as controls or containers. Let’s hope this restriction is lifted soon. I hope VA will mature and allow such simple reports without prior summarisation.

 

As usual, data preparation is the most important part of reporting work.

 

Happy coding!

 

 

%********************************************************************************;
%****************** Create fake report data *************************************;
%********************************************************************************;
data DASHBOARD_DATA;
  length APP_TYPE REGION $24;
  format DATE date9.;
  do APP_TYPE='a','b','c';
    do REGION='akl ','wai','wgt';
      APPL_NB=100+ranuni(0)*100;
      do I=1 to APPL_NB;  
        DATE=int(ranuni(0)*800+19000);
        output;
      end;
    end;
  end;
run;

%********************************************************************************;
%****************** Prepare to summarise by week and by month *******************;
%********************************************************************************;
proc format ;
  value DATES (multilabel) low-high=[monyy5.]
                           low-high=[weeku6.];
run;

%********************************************************************************;
%****************** Create a first summary to derive counts at lowest level *****;
%********************************************************************************;
proc summary data=DASHBOARD_DATA 
             nway;
  class DATE /mlf;
  class APP_TYPE REGION;
  format DATE dates.;
  output out=DASHBOARD_SUM(drop=_TYPE_  rename=(DATE=DATEC) );
run;           

%********************************************************************************;
%****************** Create summary for splitting bars by APP_TYPE ***************;
%********************************************************************************;
proc sql;
%* Create all possible crossings so the percentage denominator in VA 
   uses the correct total size of the subpop even if a crossing misses some data.
    (eg the total is 5 APP_TYPEs even if there are only 4 APP_TYPEs for REGION=1 ) ;
  create table CLASSDATA as
  select *
  from (select unique DATEC, REGION from DASHBOARD_SUM )  
     , (select unique APP_TYPE      from DASHBOARD_SUM )  
  order by DATEC, REGION, APP_TYPE  ;
quit;
%* Derive totals (denominator) for all APP_TYPEs for each crossing; 
proc summary data= DASHBOARD_SUM(rename =( _FREQ_=FREQ) )
             classdata=CLASSDATA;
  class DATEC REGION APP_TYPE;
  types DATEC*REGION*APP_TYPE
        DATEC*REGION;
  freq FREQ;
  output out=APP_TYPE_SUM(drop=_TYPE_);
run; 
%* Add totals to detail data and create navigation values;
data APP_TYPE_MEASURE;
  merge APP_TYPE_SUM(where=( missing(APP_TYPE))  rename=(_FREQ_=TOTAL) )
        APP_TYPE_SUM(where=(^missing(APP_TYPE)) );
  by DATEC REGION  ;
  SEL_GROUP    = 'By Application Type'; 
  GROUP        = APP_TYPE;   
  SEL_REGION   = REGION;
  SEL_APP_TYPE = APP_TYPE;
run;
   
%********************************************************************************;
%****************** Create summary for splitting bars by REGION *****************;
%********************************************************************************;
proc sql;
%* Create all possible crossings so the percentage denominator in VA 
   uses the correct total size of the subpop even if a crossing misses some data.
    (eg the total is 5 REGIONs even if there are only 4 REGIONs for TYPE=1 ) ;
  create table CLASSDATA as
  select *
  from (select unique DATEC, APP_TYPE from DASHBOARD_SUM)  
     , (select unique REGION          from DASHBOARD_SUM)  
  order by DATEC, APP_TYPE, REGION  ;
quit;
%* Derive totals (denominator) for all REGIONs for each crossing;
proc summary data= DASHBOARD_SUM(rename = (_FREQ_=FREQ))
             classdata=CLASSDATA;
  class DATEC APP_TYPE REGION ;
  types DATEC*APP_TYPE*REGION
        DATEC*APP_TYPE;
  freq FREQ;
  output out=REGION_SUM(drop=_TYPE_);
run;
%* Add totals to detail data and create navigation values;
data REGION_MEASURE;
  merge REGION_SUM(where=( missing(REGION)) rename=(_FREQ_=TOTAL) )
        REGION_SUM(where=(^missing(REGION)) );
  by DATEC APP_TYPE  ;
  SEL_GROUP    = 'By Region'; 
  GROUP        = REGION;     
  SEL_REGION   = REGION;
  SEL_APP_TYPE = APP_TYPE;           
run;

%********************************************************************************;
%****************** Create navigation values for when bars are not split ********;
%********************************************************************************;
data ALL_MEASURE;
  set DASHBOARD_SUM;
  SEL_GROUP   = ' No Split';    
  GROUP       = ' No Split'; 
  TOTAL       = _FREQ_;
  SEL_REGION  = REGION;
  SEL_APP_TYPE= APP_TYPE;                
run;

%********************************************************************************;
%****************** Append all summaries together and add date navigation *******;
%********************************************************************************;
data VA_DASHBOARD;     
  length DATEC SEL_GROUP GROUP SEL_REGION SEL_APP_TYPE $24;
  format DATE date9.;
  set ALL_MEASURE
      REGION_MEASURE
      APP_TYPE_MEASURE;
  SEL_DATE=ifc(index(DATEC,'W'), 'By Week', 'By Month');
  DATE    =ifn(index(DATEC,'W'), input(DATEC,?? weeku6.), input(DATEC,?? monyy5.));
  DATEC   =ifc(index(DATEC,'W'), nldate(DATE,'%Y.%m.%d'), nldate(DATE,'%Y.%m'));
  label _FREQ_='Count' DATEC='Date';
  keep SEL_: GROUP _FREQ_ TOTAL DATEC DATE;
run;

 

This article is a copy of this thread: https://communities.sas.com/t5/SAS-Visual-Analytics/Making-VA-reports-more-dynamic-category-measure-... . I decided to push it as a library article after being asked about this topic a few times.

 

Be sure to read the comments in the original thread as the discussion contains useful additional information.

Comments

I've just tried it and loved it! Very useful and interactive approach! 

Thanks for sharing. 🙂

Version history
Last update:
‎06-08-2017 10:06 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags