Desktop productivity for business analysts and programmers

Data observation Encryption

Reply
New Contributor
Posts: 3

Data observation Encryption

Hello!

 

I would like to ask which is the best technique to encrypt observations in a dataset. For example, currently I am using SHA256 to hash the variable Model in the dataset Cars. However I cannot reverse the Model after it is hashed.

 

Is there any other efficient way to do the encryption on the observations?

 

 CarsDataset.png

Esteemed Advisor
Esteemed Advisor
Posts: 7,253

Re: Data observation Encryption

Well, from my understanding, MD5/SHA are almost like signatures - there are a formula there to ensure the data is the same, be it after tranporting or something like that.  It is not an encryption device.

Now it depends on what your trying to achieve here.  If you just want to make the data unreadable, a simple formula can do that, then only the person knowing the formula can decipher it.  Even something simple like:

data decodes;
  input letter $ code;
datalines;
A 6
B 2
C 8
D 13
E 26
F 1
G 5
;
run;

data _null_;
  set decodes end=last;
  if _n_=1 then call execute('data want; set sashelp.cars;');
  call execute(cats('make=tranwrd(make,"',letter,'","',put(code,best.),'");'));
  if last then call execute('run;');
run;

Can encrypt a column and only the person who had the decodes dataset would be able to read it - ok, its probably a simple problem to run an algorithm over that particular version to decode it, but this is just an example, you could have a whole matrix of coding.

Esteemed Advisor
Posts: 5,202

Re: Data observation Encryption

Actually, MD5/SHA256 does encrypt as  well, but it isn't super secure.

But you say "observation". Do you mean all observation, whole observations? Or just specific variables/columns?

And you can encrypt whole data sets if you wish, and it can be encrypted by the user/programs that knows the key.

Other options including creating your own algorithms, like

http://support.sas.com/resources/papers/proceedings10/108-2010.pdf

Or take look at SAS Federation Server, which has standard functionality for data masking.

Data never sleeps
New Contributor
Posts: 3

Re: Data observation Encryption

Thank you both for your replies!

 

The reason that I need encryption is to make data unreadable to users (the data that we need to encrypt are phone numbers). So, in a dataset with several columns (Customer Name, Address, Several KPIs) there are also specific columns with phone numbers that we need to make unreadable. 

 

Currently, as I have aforementioned, I use hash functions (SHA256) and I was wondering if there is any similar function to do the encryption, and ideally the decipher, easily.

 

I have found several algorithms to do so, but I asked also here so as to have a second opinion and if there is something more specific!

Thanks in advance!

 

Respected Advisor
Posts: 3,069

Re: Data observation Encryption

This maybe a dumb question but why cant you just remove or mask the phone numbers from the columns rather than encrypting them?

New Contributor
Posts: 3

Re: Data observation Encryption

The reason that I cannot remove the phone numbers is that we need to join several tables on the specific fields. What do you mean by masking the phone numbers?
Respected Advisor
Posts: 3,069

Re: Data observation Encryption

By masking I mean you would replace the phone numbers with something like X's. If these fields are required for joining, why can't you do your joining first, then mask or remove afterwards?

 

Perhaps I'm not understanding your issues correctly - I just think there may be an easier way than encryption. 

 

 

Esteemed Advisor
Posts: 5,202

Re: Data observation Encryption

If you encrypt/encode your columns, you can store a translation table in a safe place.

Another option would be to use surrogate keys, but that requires a little more coding to get in place.

Yet another option is to metabound your library, and then create a view(s) pointing to the original tables. And the view can leave out the sensitive data.

Or, move to a data store that support columns level authorization, such as SPD Server or any(?) external RDBMS.

Data never sleeps
Grand Advisor
Posts: 9,596

Re: Data observation Encryption

Here is a very simple example.

 

%let offset=1;

data _null_;
original='you think you know me';
put original=;

length encrypt $ 40;
do i=1 to lengthn(original);
 substr(encrypt,i,1)=byte(rank(char(original,i))+&offset);
end;
put encrypt=;

length decrypt $ 40;
do i=1 to lengthn(encrypt);
 substr(decrypt,i,1)=byte(rank(char(encrypt,i))-&offset);
end;
put decrypt=;
run;
Ask a Question
Discussion stats
  • 8 replies
  • 155 views
  • 2 likes
  • 5 in conversation