BookmarkSubscribeRSS Feed
Patrick
Opal | Level 21

Using VA 7.5 under SAS 9.4 I'm looking for a solution that avoids double counting (sums) when aggregating data.

 

What I've got is data like below where object A appears under two different Lev3 categories.

data have;
  infile datalines truncover dsd;
  input (lev1 lev2 lev3 object) ($) size;
  datalines;
L1,L2.1,L3.1,A,5
L1,L2.1,L3.1,b,3
L1,L2.1,L3.2,A,5
L1,L2.1,L3.2,c,1
;

What I want is a VA report where I can drill down into the hierarchies. If I look at Lev3 detail then I want to see the size of object A as in the data, but if I look at Lev2 then I want the aggregated size value to only include object A once.

 

 

To illustrate what I'm after here how I could do this with a SQL

%macro aggregation(categories);
  proc sql;
    select 
      &categories.,
      sum(size) as sum_size
    from 
      (select distinct &categories., object,size from have)
    group by &categories
    ;
  quit;
%mend;

%aggregation(%str(lev1, lev2, lev3));
%aggregation(%str(lev1, lev2));

Patrick_0-1674348972251.png

For the 2nd report above the size of object A is only used once.

 

Is there any way how I can achieve the same dynamically for a VA 7.5 report?

  • I would like a single base table loaded into VA memory
  • There will be intraday delta loads (append) that can add "duplicates" so it needs to be fully dynamic
  • The volumes aren't that high (around 100000 rows) so I'm not too concerned about query performance

I'm not very versed with VA so hoping that there is a simple solution to this (like some on-the-fly weight or the like).

 

To provide some background about the real data:

  • What I've got is a SAS solution where each instance (run) creates data that's kept in an instance specific location.
  • The solution moves the physical data to a pool area and keeps symbolic links pointing to the data in the instance area.
  • If the solution works out that physical data already exists (previous instance and no change to input data) then it only creates a symbolic link in the instance area to the already existing physical data in the pool area.
    • And that's where these "duplicates" come from. 
    • size in above sample is the file size of the physical data. I want to show them when looking at instance detail level but want to count them only once if showing the sums on solution level (I've got in reality 4 hierarchy levels).
1 REPLY 1
SASKiwi
PROC Star

I'm using exactly the same version of SAS and VA you are.

 

My approach to doing data preparation for VA is to do it all in Base SAS creating a SAS dataset on disk that then is loaded as is into LASR memory so that the LASR table is an exact copy of the SAS dataset. There are other ways of preparing data to load into VA, but since I already know Base SAS I find that works best for me. I also use SAS code to load the data into VA. The big advantage of this is reloading VA data is just a matter of rerunning your code. We have scheduled SAS jobs that prepare data and load it into VA automatically. I've built a SAS macro for loading VA tables to simplify the process.

 

Regarding summarizing data in VA, you typically load detail-level data then use VA reports to aggregate. For tabular summary reports, the Crosstab object works well. Think of it like a simplified PROC REPORT that contains GROUP or ACROSS columns. If you stack the GROUP columns then you can drill down levels by clicking on them. You can also define calculated items in VA to go in the report - these are analogous to PROC REPORT calculations.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Tips for filtering data sources in SAS Visual Analytics

See how to use one filter for multiple data sources by mapping your data from SAS’ Alexandria McCall.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 236 views
  • 0 likes
  • 2 in conversation