BookmarkSubscribeRSS Feed

VA Report Example: Distinct Count If – Run multiple queries at once

Started ‎03-29-2022 by
Modified ‎03-29-2022 by
Views 1,106

A recent report requirement needed the ratio of the Distinct Count of IDs by subjecting the numerator and denominator to different criteria conditions (similar to a Where or Case clause). We can perform this calculation in one expression, there by essentially running two subqueries at once.

 

Just to make the calculation more dynamic, I also parameterized the criteria conditions so that a report viewer can enter values and make various multi-value selections to drive the Ratio being analyzed.

 

Sample Report

 

Here is my final sample report. The criteria conditions are driven by the Text Input Control Object for MinOrderTotalParameter and the List Control Object for multi-select ProductLineParameter.

 

In pseudocode, the Ratio expression we create is:

 

Count Distinct OrderIDs IF ( ( Order Total >= MinOrderTotalParameter ) AND (Product Line In ProductLineParameter ) ) / Count Distinct OrderIDs IF ( Order Total >= MinOrderTotalParameter )

01_SampleReport.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Parameters

 

First create your parameters. MinOrderTotalParameter: we want the user to type in the numeric value for the minimum Order Total to be used for the criteria condition, we will need to create a numeric parameter. ProductLineParameter: we want the user to be able to select multiple Product Lines therefore we must be sure to select the box to allow for Multiple values. From the Data pane, use the + New data item menu and select Parameter.

 

02_NewParameter-194x300.png

 

03_NumericAndCharParameters.png

 

Ratio Expression

 

When working with a ratio comprised of complex expressions for both the numerator and denominator, it’s best to first separate the expressions to be sure you are getting the expected results.

 

In order to evaluate each row against our parameterized criteria we will use the IF...ELSE statement. While there are lots of other options that you may be familiar with such as CASE statements, WHERE clauses, or the COUNTIF expression, there’s usually a good way to program something similar. And if you ever run into a roadblock, don’t hesitate to reach out!

 

Numerator Here is the numerator’s pseudocode:

 

Count Distinct OrderIDs IF ( ( Order Total >= MinOrderTotalParameter ) AND (Product Line In ProductLineParameter ) )

 

Here’s is how we will build this expression in SAS Visual Analytics with a few notes.

 

  • Distinct: Per SAS Documentation: calculates the number of distinct values in a data item. By default, if the data item contains missing values, then “missing” increases the distinct count by one. This behavior can be changed by an administrator.
  • _ ByGroup_: this aggregation context means that the expression will be Grouped By the Roles assigned to the visual. The other aggregation context _ ForAll_ would evaluate the expression for the entire data source regardless of assigned Roles to the visual.
  • IF…ELSE: IF the parameterized criteria is met, then we will return the Order ID to be used in the Distinct count, Else we will return . aka Missing.

 

Text:

Distinct [_ByGroup_] ((
IF ( ( 'Order Total'n >= 'MinOrderTotalParameter'p ) AND ('Product Line'n In 'ProductLineParameter'p ) )
RETURN 'Order ID'n
ELSE '.' ))

Visual:

 

04_Numerator.png

 

Denominator Here is the denominator’s pseudocode. Since I already described the different operators above, I won’t repeat.

 

Count Distinct OrderIDs IF ( Order Total >= MinOrderTotalParameter )

 

Text:

 

Distinct [_ByGroup_] ((
IF ( 'Order Total'n >= 'MinOrderTotalParameter'p )
RETURN 'Order ID'n
ELSE '.' ))

 

Visual:

 

05_Denominator.png

 

 

Verify

 

Now comes the most critical step in every expression, verify you are getting the expected results. And notice here that I didn’t say the correct answer, the expression will always return the correct answer for your data, but this what you expected? Do you need to take into account factors that are special with your data? Are you handling zeros and missings like you want? Does your expression need to be case insensitive? There are many additional operators that may need to be added to your expression so that you are getting your expected results.

 

In order to help me verify my results, I broke down the expression by one more step, to see how each row was classified by the parameterized criteria. That is, I took off the Distinct Count wrapper. This left me with a Raw Numerator and Raw Denominator I could use in a List Table object to verify the underlying values being counted.

 

06_RawNumAndRawDenom.png

 

Here is the way I verified this expression. I added part of the parameterized criteria to the Crosstab Object, namely Product Line. Then I looked at the returned Distinct Counts for both the numerator, denominator and resulting ratio.

 

To further investigate the underlying rows and how they were being categorized before being counted, I created the raw numerator and raw denominator as explained above. Then I added those values to individual List Table Objects and toggled the Detail Data option on and off while comparing the Distinct Count numbers.

 

As you can see, and if you remember per the SAS Documentation, the Distinct operator calculates the number of distinct values in a data item. By default, if the data item contains missing values, then “missing” increases the distinct count by one. This behavior can be changed by an administrator.

 

We can see how the value “missing” is being counted as a distinct value. If you as a company wanted to change this behavior you can make that change using SAS Environment Manager modifying the sas.reportdata.properties.ignoreMissingValuesInCountDistinct. Or we can easily offset this behavior by subtracting one from both the numerator and denominator.

 

07_VerifyResults.png

 

This leaves us with the final Ratio Offset expression where we subtract one from both the numerator and denominator.

 

Text:

 

( Distinct [_ByGroup_] ((IF ( ( 'Order Total'n >= 'MinOrderTotalParameter'p ) AND ('Product Line'n In 'ProductLineParameter'p ) )
RETURN 'Order ID'n
ELSE '.' )) - 1 ) / ( Distinct [_ByGroup_] ((IF ( 'Order Total'n >= 'MinOrderTotalParameter'p )
RETURN 'Order ID'n
ELSE '.' )) - 1 )

 

Visual:

 

08_RatioOffset.png

 

Remember to verify that this is the behavior we expect.

 

09_VerifyRatioOffset.png

 

Conclusion

 

If you’ve made it to the end of this example, the two key points I’d like you to remember are:

  1. Regardless of how you’re used to coding, whether it be with CASE statements, WHERE clauses, or COUNTIF expressions, there is usually a way to do something comparable in SAS Visual Analytics.
  2. Verify your expressions. Take the time to breakdown the pieces and make sure that the returned results are what you are expecting.

 

Other VA Report Examples

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎03-29-2022 04:31 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags