12-27-2017 06:52 AM
12-27-2017 07:02 AM
For bulk work I would suggest to take the C road. Have a look at the MODULE() set of functions/call routines in association with the SASCBTBL specification. This would allow for small to large volume decryption in datastep and SQL alike. Java would be my second choice. REST is also possible but does not scale well.
12-27-2017 04:18 PM
Thank you for your answer. The bulk work will be performed by the ETL when the data are exported to the data warehouse. So no bulk need in SAS.
But if I understand correctly the same approach can be used to dynamically decrypt the result of the SQL queries. I will have a look at this.
Thanks and regards
12-27-2017 09:14 AM
12-27-2017 04:21 PM
You are right. Only few PII need to be encrypted, and this should not impact the analytics. But there are loads of SQL queries that are used by the end users to generate reports on their customers that are going to be impacted. These are the ones which will need to be decrypted.
The tokenisation tool used is very business oriented, and is very strong at anonymising the end user reports.
The objective is to find the best integration with SAS.
12-27-2017 05:30 PM - edited 12-27-2017 05:53 PM
With your feedbacks I started digging more precisely into SAS capabilities and also found the SAS Enterprise Guide Addin Task capability
I was wondering if it would be worth considering a SAS Guide Com Add-In as well.
This add-in could get the results of the SQL queries, and decrypt them dynamically. This would be initiated on the client side.
Do you think this would be do-able?
Or even, would it be possible to extend the behaviour of the query builder?
12-27-2017 07:38 PM - edited 12-27-2017 07:56 PM
I'm very much with @LinusH on this one. Also that SAS would likely be able to interface with the existing API's - especially REST - these API's appear very much built for transactional processes which is not like SAS is normally used.
SAS EG is just the client and could for example send code for execution to the server which then queries the database, enriches and aggregates the data and creates a report which gets written somewhere to a network location or directly sent via email attachment. That's why decryption must happen on the server side.
How I see this you need to solve two main challenges here:
1. How can you make the data available for SAS queries so that they remains useful?
2. What do you need to do so that also SAS processes are compliant?
As for 1
I can understand that the DW guys want to store the data at rest encrypted. What they "should" be doing is provide views which decrypt the data and then control who's got access to the decrypted columns in these views.
As for 2
That's in my mind going to be the bigger challenge. It's very simple and common to replicate/transform data to SAS tables for processing and though whatever decryption path gets chosen, you'll end up with non-decrypted sensitive data in - eventually permanent - SAS files.
If these sensitive fields are just used for reporting - i.e. as classification variables - on a aggregated level then such reports can also get developed on development data. You would still need a business process for hand-over into production where these reports can be run against real decrypted data; and "someone" would need to create the development data if they don't exist already.
A bigger challenge would be if such sensitive fields are also used for analytics. There the modelers need the real unencrypted data and to comply with GDPR/PDP requirements you might end up with the need for a brand new locked down SAS environment for model development and testing (i.e. one where you can't send emails from, where you can't write to shared drives, where also the EG client is in an environment like Cytrix which you can lock-down, etc..). If this is what you need to solve then may-be contact SAS directly and get some of their consulting time as this must be due to GDPR compliance requirements something not only your company has to address.
12-28-2017 03:13 AM
I understand your points. It is very clear.
The objective is also to make tis DW GDPR compliant, and the encryption/decryption solution is fit for purpose. The GDPR aspects you mention are also part of the project. It is also true that the solution is more transaction oriented.
But this is how SAS is being used at the moment in the company, for the project we are implementing.
The users are given access to SAS Enterprise Guide to execute SQL queries they write themselves. This is the use case. So the use is transactional. The decryption must happen on a transaction basis, when data are returned encrypted to the users.
(PS: having the data encrypted in the DW is a requirement. So having predefined decrypted view is not an option)
I agree that it is better that decryption happens on the server side.
In this case, do you think it is possible to have a generic approach based on modules, these modules calling the encryption/decryption APIs? I mean generic, because in this use case, every SQL queries are in scope, because they potentially embark sensitive data.
12-28-2017 06:03 AM
I am currently involved in a similar project. We do not deal with encrypting as much as tokenization or protection of data items as we call it. Data items can be protected and unprotected in many ways. Sometimes unprotection needs to be done only partially, eg. only first n digits of a social security number, part of a postal code or whatever. Everything on a need-to-know basis. So leaving everything up to the data warehouse is not always an option, especially when SAS plays an important role in the data logistics in and out of that dw. We have a dw in Teradata but data in SAS are also rather wide spread and they do not escape GDPR or similar rules.
Another aspect that hits home severely is data quality. One can live with having a SSN in mixed formats (num versus char) until protection/encryption comes in and values are no longer match and relationships between tables get lost. Solving these issues at the root can take years. Having the ability for SAS to participate in this protection game is VERY important. Having the API's encrypt()/decrypt()/protect()/unprotect() functions available in SAS datastep and SQL can be very helpful. Hence my advice to surface that API using mentioned tooling.
The plethora of challenges that arises from data quality and authorizations make it too complicated to be addressed by just whipping up a couple of views and get on with it.