BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20

 

SAS VA can be frustrating for users.

 

Once a report is created, users cannot change what variables are displayed on graphs, what formats are used, what measures are displayed or how stacked bars are split. It is all static and the only change allowed is to subset the data, which can be a shock for users accustomed to slicing and dicing in a spreadsheet’s pivot table.

 

Thankfully it doesn’t have to be like this and things can be improved with a little prior data preparation.

 

In the screenshot below, which reports on counts and percentages, users can choose to have weekly or monthly aggregations (with matching date formatting) and they can split the histogram bars by a subpopulation of their choice (region or type or no split). As expected the subpopulations can also be used to filter data in a multi-choice manner. Regardless of the choices made, the correct counts and the correct percentages with respect to the total population considered are displayed. In this report, the percentages are calculated on the fly by VA.

 

va1.png

 

To obtain this flexibility, the data has to be pre-summarised for all crossings (nway), and then summarised again for each subpopulation split. This means that the size of the data set can increase considerably compared to the original detail data.

This procedure defeats the main purpose of VA’s, which is to load detail data in memory, and to derive aggregations on the fly as the user navigates the report. However, until VA can do simple frequency reports like this one without pre-aggregation, this method is very useful to provide flexibility to users without having to create numerous static reports, so I thought I’d share the method here.

 

The preparation code below does the following:
- Generate sample data
- Summarise for all crossings
- Summarise again for each subpopulation (two subpopulations here)
- Append all the data together.

- Shrink the table width (not shown here) 

 

I will not comment each step of the code. It is not very long and the steps are few and simple, so just run it and see what each step does.

 

Once the data is loaded in VA, the last thing left to do is to calculate the percentages. This is done by creating the aggregated measure:

 

Pct_freq  = Sum [_ByGroup_] ('Count'n) / Sum [_ByGroup_] ('TOTAL'n)

 

The method described here, where various ways to slice the data are appended into a table and then filtered in VA, can be extended to add any form of navigation to your reports. For example one of our reports allows users to see either counts or percentages on the same bar chart. Now if users could also change the chart type…

 

You’ll also note that the filters were added in the main reporting area because the section prompt area can only accept a very limited number of objects such as controls or containers. Let’s hope this restriction is lifted soon. I hope VA will mature and allow such simple reports without prior summarisation.

 

As usual, data preparation is the most important part of reporting work.

 

Happy coding!

 

 

The data preparation code (the colours are weird):

 

%********************************************************************************;
%****************** Create fake report data *************************************;
%********************************************************************************;
data DASHBOARD_DATA;
  length APP_TYPE REGION $24;
  format DATE date9.;
  do APP_TYPE='a','b','c';
    do REGION='akl ','wai','wgt';
      APPL_NB=100+ranuni(0)*100;
      do I=1 to APPL_NB;  
        DATE=int(ranuni(0)*800+19000);
        output;
      end;
    end;
  end;
run;

%********************************************************************************;
%****************** Prepare to summarise by week and by month *******************;
%********************************************************************************;
proc format ;
  value DATES (multilabel) low-high=[monyy5.]
                           low-high=[weeku6.];
run;

%********************************************************************************;
%****************** Create a first summary to derive counts at lowest level *****;
%********************************************************************************;
proc summary data=DASHBOARD_DATA 
             nway;
  class DATE /mlf;
  class APP_TYPE REGION;
  format DATE dates.;
  output out=DASHBOARD_SUM(drop=_TYPE_  rename=(DATE=DATEC) );
run;           

%********************************************************************************;
%****************** Create summary for splitting bars by APP_TYPE ***************;
%********************************************************************************;
proc sql;
%* Create all possible crossings so the percentage denominator in VA 
   uses the correct total size of the subpop even if a crossing misses some data.
    (eg the total is 5 APP_TYPEs even if there are only 4 APP_TYPEs for REGION=1 ) ;
  create table CLASSDATA as
  select *
  from (select unique DATEC, REGION from DASHBOARD_SUM )  
     , (select unique APP_TYPE      from DASHBOARD_SUM )  
  order by DATEC, REGION, APP_TYPE  ;
quit;
%* Derive totals (denominator) for all APP_TYPEs for each crossing; 
proc summary data= DASHBOARD_SUM(rename =( _FREQ_=FREQ) )
             classdata=CLASSDATA;
  class DATEC REGION APP_TYPE;
  types DATEC*REGION*APP_TYPE
        DATEC*REGION;
  freq FREQ;
  output out=APP_TYPE_SUM(drop=_TYPE_);
run; 
%* Add totals to detail data and create navigation values;
data APP_TYPE_MEASURE;
  merge APP_TYPE_SUM(where=( missing(APP_TYPE))  rename=(_FREQ_=TOTAL) )
        APP_TYPE_SUM(where=(^missing(APP_TYPE)) );
  by DATEC REGION  ;
  SEL_GROUP    = 'By Application Type'; 
  GROUP        = APP_TYPE;   
  SEL_REGION   = REGION;
  SEL_APP_TYPE = APP_TYPE;
run;
   
%********************************************************************************;
%****************** Create summary for splitting bars by REGION *****************;
%********************************************************************************;
proc sql;
%* Create all possible crossings so the percentage denominator in VA 
   uses the correct total size of the subpop even if a crossing misses some data.
    (eg the total is 5 REGIONs even if there are only 4 REGIONs for TYPE=1 ) ;
  create table CLASSDATA as
  select *
  from (select unique DATEC, APP_TYPE from DASHBOARD_SUM)  
     , (select unique REGION          from DASHBOARD_SUM)  
  order by DATEC, APP_TYPE, REGION  ;
quit;
%* Derive totals (denominator) for all REGIONs for each crossing;
proc summary data= DASHBOARD_SUM(rename = (_FREQ_=FREQ))
             classdata=CLASSDATA;
  class DATEC APP_TYPE REGION ;
  types DATEC*APP_TYPE*REGION
        DATEC*APP_TYPE;
  freq FREQ;
  output out=REGION_SUM(drop=_TYPE_);
run;
%* Add totals to detail data and create navigation values;
data REGION_MEASURE;
  merge REGION_SUM(where=( missing(REGION)) rename=(_FREQ_=TOTAL) )
        REGION_SUM(where=(^missing(REGION)) );
  by DATEC APP_TYPE  ;
  SEL_GROUP    = 'By Region'; 
  GROUP        = REGION;     
  SEL_REGION   = REGION;
  SEL_APP_TYPE = APP_TYPE;           
run;

%********************************************************************************;
%****************** Create navigation values for when bars are not split ********;
%********************************************************************************;
data ALL_MEASURE;
  set DASHBOARD_SUM;
  SEL_GROUP   = ' No Split';    
  GROUP       = ' No Split'; 
  TOTAL       = _FREQ_;
  SEL_REGION  = REGION;
  SEL_APP_TYPE= APP_TYPE;                
run;

%********************************************************************************;
%****************** Append all summaries together and add date navigation *******;
%********************************************************************************;
data VA_DASHBOARD;     
  length DATEC SEL_GROUP GROUP SEL_REGION SEL_APP_TYPE $24;
  format DATE date9.;
  set ALL_MEASURE
      REGION_MEASURE
      APP_TYPE_MEASURE;
  SEL_DATE=ifc(index(DATEC,'W'), 'By Week', 'By Month');
  DATE    =ifn(index(DATEC,'W'), input(DATEC,?? weeku6.), input(DATEC,?? monyy5.));
  DATEC   =ifc(index(DATEC,'W'), nldate(DATE,'%Y.%m.%d'), nldate(DATE,'%Y.%m'));
  label _FREQ_='Count' DATEC='Date';
  keep SEL_: GROUP _FREQ_ TOTAL DATEC DATE;
run;

 

 

Unrelated to VA, but still a great book:

http://www.amazon.com/High-Performance-SAS-Coding-Christian-Graffeuille/dp/1512397490

(and I am totally unbiaised!)

9 REPLIES 9
jplarios
Quartz | Level 8

Thank you for the post; interesting! 

but should not SAS VA not care about how the data is structured? 

For example:

A measure should be a measure and a non-measure ( attribute, etc) should be a non-measure and if a user wants to see the measure in a certain way on a visual, the visual should allow for it ; likewise for its context.

ChrisNZ
Tourmaline | Level 20

VA will always care how data is structured.

But if I understand your point correctly, you are right:  if a user wants to see the measure in a certain way on a visual, the visual should allow for it. We shouldn't have to twist its arm.

ChrisNZ
Tourmaline | Level 20

 

The overall design is described at the top and every step is commented. I am unsure what more you need. Any particular point you struggle with?

Run the example and see what each step does on the data.

Frank_Boekamp
Quartz | Level 8

Hi Chris,

 

A quote from you: "Once a report is created, users cannot change what variables are displayed on graphs, what formats are used, what measures are displayed or how stacked bars are split".

 

The variables and measures that are displayed on graphs can be selected by the report viewer. It's a bit of a trick but you can use parameters (from a drop down list box) to select the variables and measures that are displayed. The formats however can not be changed dynamically in SAS 7.3 to my best knowledge...

 

Cheers,

Frank

ChrisNZ
Tourmaline | Level 20

@Frank_Boekamp Thanks for this Frank. I only know of parameters used to replace data (i.e. values), not metadata (i.e. variable names). Can you point me to a resource where this is shown?

Lodewijk
Obsidian | Level 7

@ChrisNZ Frank is right, but it is a bit of a trick. I outlined the solution in this topic: Change Stack Grouping based on dropdown choice It requires you to load a separate reference table, that you use to populate the dropdowns. I use this technique frequently to create more dynamic reports.

 

You can expand the reference table to a two column table, to hold more options per control element. Like:

 

Control            Value
dropdown1          option 1
dropdown1          option 2
dropdown2          option a
dropdown2          option b

Assign Value to the dropdowns and filter them on the control column. Using the custom sort for Value you can display the values in the right order in the dropdowns.

 

You can even use the controls in the report filter section in VA as controls for all sections/tabs of the report, since they do not filter anything. The reference table and the data table are not linked in anyway, just two separate datasources.

yhuang
Fluorite | Level 6

How do you use this data preparation codes in VA?

 

Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 6079 views
  • 8 likes
  • 6 in conversation