We’re smarter together. Learn from this collection of community knowledge and add your expertise.

SAS Federation Server for GDPR - Data Masking

by SAS Employee audrey 3 weeks ago - edited 3 weeks ago by Community Manager (1,339 Views)

In order to comply with Articles 25, 32 and 34 of the GDPR regulation, organisations need to be able to protect Personal Data by applying security patterns including pseudonymisation, anonymization and encryption.

 

Data Masking is a way of hiding sensitive data or personal identifiable information (PII) inside data sources. A personal identifiable information (PII) is anything that could be used to make a distinction between two persons, for example a passport ID. The goal of Data Masking is to hide the original value, while maintaining integrity, for the users that don’t have permission to access it.

 

The function used for masking is SYSCAT.DM.MASK. It can be used with various rule types – which we will describe below – and arguments. Here’s the syntax:

SYSCAT.DM.MASK( ‘rule-type’, PII [, rule-arguments])

 

Hash

The HASH rule hashes a value into a fixed-length hash digest or HMAC string.  It is deterministic, not reversible, and the results might be unique if the inputs are unique.

It needs an algorithm in input, which can be MD5 or SHA256 (requires DataFlux Secure).

 

SELECT "FULL_NAME",
SYSCAT.DM.MASK('HASH', "FULL_NAME", 'ALG', 'MD5', 'CASE', 'U', 'KEY', 'MyPersonalKey')
FROM "MDM_PDP_INDIVIDUAL_TT"

 

1.png

 

 

 

SELECT "FULL_NAME",
SYSCAT.DM.MASK('HASH', "FULL_NAME", 'ALG', 'SHA256', 'CASE', 'U', 'KEY', 'MyPersonalKey')
FROM "MDM_PDP_INDIVIDUAL_TT"

 

 2.png

 

MD5 is a 128-bit algorithm, SHA256 is a 256-bit algorithm – it is better to use the latter for best security.

 

Tranc

The TRANC rule masks the values in a column by transliterating characters from the input string: the mapped result containing ‘many-to-1’ character transliterations will ensure an inverse transliteration does not determine the original value.

 

SELECT distinct("FULL_NAME"),
SYSCAT.DM.MASK('TRANC', "FULL_NAME", 'FROM', 'ABELOS', 'TO', '483105') AS TRANC
FROM "MDM_PDP_INDIVIDUAL_TT"

 

 3.png

 

Random rules

There are 3 different random rules, which are used respectively for numeric, date and string inputs.

 

The RANDOM rules masks the values in a numeric column which results in a uniformly distributed pseudo-random number. It requires at least a VARY argument (+/-variance of the amount) or a couple MIN/MAX arguments.

 

/* anonymizing income using random with variance +/- 5000 */
SELECT "INCOME",
SYSCAT.DM.MASK('RANDOM', "INCOME", 'VARY', 5000) as RANDOM
FROM "CUSTOMERS"

 4.png

 

/* anonymizing income using random with values between 80000 and 100000 */
SELECT "INCOME",
SYSCAT.DM.MASK('RANDOM', "INCOME", 'MIN', 80000, 'MAX',100000) as RANDOM
FROM "CUSTOMERS"

  5.png

 

The RANDATE rule masks the values in a date column by replacing them with pseudo-random date values. It accepts MIN/MAX or VARY, which should be used with a UNITS argument (Day, Week, Month, Year, etc).

 

/* anonymizing Date of Birth with RANDATE +/- 5 years and formatting dates */
SELECT put("DOB", DTDATE.) as DOB,
put(SYSCAT.DM.MASK('RANDATE', "DOB", 'VARY', 5, 'UNITS','YEAR'), DTDATE.) as RANDOM_DOB
FROM "CONTACTS_FR"

 6.png

 

The RANSTR rule masks the values in a column by replacing with random strings. Strings are generated by an algorithm that uses characters from the source string in the generation process, adding padding characters if necessary.

 

The RANDIG rule masks the numeric values in a column by replacing digits with strings of random digits. This is an alias of RANSTR with some specific arguments (BASE, CASE).

 

Encrypt/Decrypt

 

ENCRYPT masks the values in a column by encrypting or encoding a single value using a symmetric key cipher or simple encoding algorithm. To achieve truly encrypted results, a KEY argument must be specified, or a default ENCRYPT_KEY pre-configured.

 

ENCRYPT is a replacement of HASH when uniqueness, reversal or decryption are required.

 

select "PWD",
SYSCAT.DM.MASK('ENCRYPT', "PWD", 'ALG', 'AES/FIPS', 'KEY', 'MyEncryptionKey', 'DETERMINISTIC', 'true' ) as ENCRYPTED_PWD
FROM "USERS";

 7.png

 

SELECT ENCRYPTED_PWD,
SYSCAT.DM.MASK('DECRYPT', "ENCRYPTED_PWD", 'ALG', 'AES/FIPS', 'KEY', 'MyEncryptionKey') as DECRYPTED_PWD
FROM TEMP

 

 8.png

 

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.


Looking for the Ask the Expert series? Find it in its new home: communities.sas.com/askexpert.