Not all numbers are created equal. In a business setting, certain numbers may be available to the entire organization, while others are sensitive enough that you want to limit access. How can you mask sensitive data so you don’t inadvertently release private information to a wider audience? The answer: data suppression.
Why Data Suppression?
This article is not about security, or HR confidentiality policies, or anything like that. It’s about how we limit the indirect disclosure of sensitive information in reports and dashboards.
A company wants to safeguard sensitive information like salaries and make sure they don’t disclose how much each person makes. That is a given. But they also want to make sure they don’t provide a way for people to indirectly figure out the salary of an individual. That is where data suppression comes in.
In this simple example, we have 30 employees working in 8 different departments. Each employee is assigned to one department and has an annual salary. The detail data has one record per employee as seen in this list table. (Note - I have attached an Excel file with these records so you can try this yourself.)
I also created a crosstab report from the same data source, which includes department, frequency and salary. Since a crosstab aggregates each group, frequency is essentially the count of employees per department. I sorted on the frequency column so the department with the fewest employees is at the top.
The above report is a typical tabular report. You can quickly view salary totals by department. But as I mentioned earlier, salary information is sensitive, and we wouldn’t want to inadvertently disclose how much certain employees earn. In this case, because there is only one person assigned to the Executive department, if you know the name of that person, this report essentially tells you how much they make. That would not be our intention, but that is the result of a report like this. It might not even be that hard to figure out how much each person makes in departments with only 2 or 3 people.
Creating a New Data Suppression Calculation
What if there was a way for you to have the same report, with the totals, but mask or suppress the value in the departments that have fewer than some minimum threshold of employees? If you could do that, then you would have data suppression. Let’s explore how this would work.
Right click on Salary, select “New Calculation”.
Select Data Suppression as the Type, and keep the default value of 5 for “Suppress data if count less than”. Click “OK” to create the calculation.
Drag the new aggregated measure “Salary (Data suppression)” into your cross tab. Your new report should now look like this:
For every department with less than 5 employees the salary values are suppressed and shown as an asterisk ( * ). Even with the values suppressed, the total still represents all values for that measure, whether they are displayed or not. That’s all there is to it. A simple, yet powerful 1-click calculation to create your data suppression. [Note - if this were the final report I would not include the salary column since it shows all salaries. It is only there for reference while we explored this functionality.]
If you wanted to change the threshold value, just edit the calculation and change the upper bound from 5 to your new value.
Even though this was created using the new data suppression 1-click calculation, there is more to it than you might think.
As always, you have the option to create a new calculation from scratch. Once you drag in the “suppress” operator your formula would look like this.
The bottom line - data suppression gives report authors the ability to easily suppress (or mask) certain data from report consumers, helping to keep sensitive data private. Suppress is one of the new aggregated (advanced) operators. Click here to view this section of the VA 8.2 user guide. If you have access to SAS Visual Analytics 8.2, feel free to give this a try. The simple data set used in this example is attached to help get you started. Happy visualizing!
Time is running out to save with the early bird rate. Register by Friday, March 1 for just $695 - $100 off the standard rate.
Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.