06-08-2017 05:37 AM
I would like to ask which is the best technique to encrypt observations in a dataset. For example, currently I am using SHA256 to hash the variable Model in the dataset Cars. However I cannot reverse the Model after it is hashed.
Is there any other efficient way to do the encryption on the observations?
06-08-2017 06:20 AM
Well, from my understanding, MD5/SHA are almost like signatures - there are a formula there to ensure the data is the same, be it after tranporting or something like that. It is not an encryption device.
Now it depends on what your trying to achieve here. If you just want to make the data unreadable, a simple formula can do that, then only the person knowing the formula can decipher it. Even something simple like:
data decodes; input letter $ code; datalines; A 6 B 2 C 8 D 13 E 26 F 1 G 5 ; run; data _null_; set decodes end=last; if _n_=1 then call execute('data want; set sashelp.cars;'); call execute(cats('make=tranwrd(make,"',letter,'","',put(code,best.),'");')); if last then call execute('run;'); run;
Can encrypt a column and only the person who had the decodes dataset would be able to read it - ok, its probably a simple problem to run an algorithm over that particular version to decode it, but this is just an example, you could have a whole matrix of coding.
06-08-2017 06:47 AM
Actually, MD5/SHA256 does encrypt as well, but it isn't super secure.
But you say "observation". Do you mean all observation, whole observations? Or just specific variables/columns?
And you can encrypt whole data sets if you wish, and it can be encrypted by the user/programs that knows the key.
Other options including creating your own algorithms, like
Or take look at SAS Federation Server, which has standard functionality for data masking.
06-08-2017 06:57 AM
Thank you both for your replies!
The reason that I need encryption is to make data unreadable to users (the data that we need to encrypt are phone numbers). So, in a dataset with several columns (Customer Name, Address, Several KPIs) there are also specific columns with phone numbers that we need to make unreadable.
Currently, as I have aforementioned, I use hash functions (SHA256) and I was wondering if there is any similar function to do the encryption, and ideally the decipher, easily.
I have found several algorithms to do so, but I asked also here so as to have a second opinion and if there is something more specific!
Thanks in advance!
06-08-2017 05:42 PM
By masking I mean you would replace the phone numbers with something like X's. If these fields are required for joining, why can't you do your joining first, then mask or remove afterwards?
Perhaps I'm not understanding your issues correctly - I just think there may be an easier way than encryption.
06-09-2017 04:12 AM
If you encrypt/encode your columns, you can store a translation table in a safe place.
Another option would be to use surrogate keys, but that requires a little more coding to get in place.
Yet another option is to metabound your library, and then create a view(s) pointing to the original tables. And the view can leave out the sensitive data.
Or, move to a data store that support columns level authorization, such as SPD Server or any(?) external RDBMS.
06-08-2017 08:35 AM
Here is a very simple example.
%let offset=1; data _null_; original='you think you know me'; put original=; length encrypt $ 40; do i=1 to lengthn(original); substr(encrypt,i,1)=byte(rank(char(original,i))+&offset); end; put encrypt=; length decrypt $ 40; do i=1 to lengthn(encrypt); substr(decrypt,i,1)=byte(rank(char(encrypt,i))-&offset); end; put decrypt=; run;