In order to comply with Articles 25, 32 and 34 of the GDPR regulation, organisations need to be able to protect Personal Data by applying security patterns including pseudonymisation, anonymization and encryption.
Data Masking is a way of hiding sensitive data or personal identifiable information (PII) inside data sources. A personal identifiable information (PII) is anything that could be used to make a distinction between two persons, for example a passport ID. The goal of Data Masking is to hide the original value, while maintaining integrity, for the users that don’t have permission to access it.
The function used for masking is SYSCAT.DM.MASK. It can be used with various rule types – which we will describe below – and arguments. Here’s the syntax:
SYSCAT.DM.MASK( ‘rule-type’, PII [, rule-arguments])
The HASH rule hashes a value into a fixed-length hash digest or HMAC string. It is deterministic, not reversible, and the results might be unique if the inputs are unique.
It needs an algorithm in input, which can be MD5 or SHA256 (requires DataFlux Secure).
SYSCAT.DM.MASK('HASH', "FULL_NAME", 'ALG', 'MD5', 'CASE', 'U', 'KEY', 'MyPersonalKey')
SYSCAT.DM.MASK('HASH', "FULL_NAME", 'ALG', 'SHA256', 'CASE', 'U', 'KEY', 'MyPersonalKey')
MD5 is a 128-bit algorithm, SHA256 is a 256-bit algorithm – it is better to use the latter for best security.
The TRANC rule masks the values in a column by transliterating characters from the input string: the mapped result containing ‘many-to-1’ character transliterations will ensure an inverse transliteration does not determine the original value.
SYSCAT.DM.MASK('TRANC', "FULL_NAME", 'FROM', 'ABELOS', 'TO', '483105') AS TRANC
There are 3 different random rules, which are used respectively for numeric, date and string inputs.
The RANDOM rules masks the values in a numeric column which results in a uniformly distributed pseudo-random number. It requires at least a VARY argument (+/-variance of the amount) or a couple MIN/MAX arguments.
/* anonymizing income using random with variance +/- 5000 */
SYSCAT.DM.MASK('RANDOM', "INCOME", 'VARY', 5000) as RANDOM
/* anonymizing income using random with values between 80000 and 100000 */
SYSCAT.DM.MASK('RANDOM', "INCOME", 'MIN', 80000, 'MAX',100000) as RANDOM
The RANDATE rule masks the values in a date column by replacing them with pseudo-random date values. It accepts MIN/MAX or VARY, which should be used with a UNITS argument (Day, Week, Month, Year, etc).
/* anonymizing Date of Birth with RANDATE +/- 5 years and formatting dates */
SELECT put("DOB", DTDATE.) as DOB,
put(SYSCAT.DM.MASK('RANDATE', "DOB", 'VARY', 5, 'UNITS','YEAR'), DTDATE.) as RANDOM_DOB
The RANSTR rule masks the values in a column by replacing with random strings. Strings are generated by an algorithm that uses characters from the source string in the generation process, adding padding characters if necessary.
The RANDIG rule masks the numeric values in a column by replacing digits with strings of random digits. This is an alias of RANSTR with some specific arguments (BASE, CASE).
ENCRYPT masks the values in a column by encrypting or encoding a single value using a symmetric key cipher or simple encoding algorithm. To achieve truly encrypted results, a KEY argument must be specified, or a default ENCRYPT_KEY pre-configured.
ENCRYPT is a replacement of HASH when uniqueness, reversal or decryption are required.
SYSCAT.DM.MASK('ENCRYPT', "PWD", 'ALG', 'AES/FIPS', 'KEY', 'MyEncryptionKey', 'DETERMINISTIC', 'true' ) as ENCRYPTED_PWD
SYSCAT.DM.MASK('DECRYPT', "ENCRYPTED_PWD", 'ALG', 'AES/FIPS', 'KEY', 'MyEncryptionKey') as DECRYPTED_PWD