BookmarkSubscribeRSS Feed
Menfragiadakis
Calcite | Level 5

Hello!

 

I would like to ask which is the best technique to encrypt observations in a dataset. For example, currently I am using SHA256 to hash the variable Model in the dataset Cars. However I cannot reverse the Model after it is hashed.

 

Is there any other efficient way to do the encryption on the observations?

 

 CarsDataset.png

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, from my understanding, MD5/SHA are almost like signatures - there are a formula there to ensure the data is the same, be it after tranporting or something like that.  It is not an encryption device.

Now it depends on what your trying to achieve here.  If you just want to make the data unreadable, a simple formula can do that, then only the person knowing the formula can decipher it.  Even something simple like:

data decodes;
  input letter $ code;
datalines;
A 6
B 2
C 8
D 13
E 26
F 1
G 5
;
run;

data _null_;
  set decodes end=last;
  if _n_=1 then call execute('data want; set sashelp.cars;');
  call execute(cats('make=tranwrd(make,"',letter,'","',put(code,best.),'");'));
  if last then call execute('run;');
run;

Can encrypt a column and only the person who had the decodes dataset would be able to read it - ok, its probably a simple problem to run an algorithm over that particular version to decode it, but this is just an example, you could have a whole matrix of coding.

LinusH
Tourmaline | Level 20

Actually, MD5/SHA256 does encrypt as  well, but it isn't super secure.

But you say "observation". Do you mean all observation, whole observations? Or just specific variables/columns?

And you can encrypt whole data sets if you wish, and it can be encrypted by the user/programs that knows the key.

Other options including creating your own algorithms, like

http://support.sas.com/resources/papers/proceedings10/108-2010.pdf

Or take look at SAS Federation Server, which has standard functionality for data masking.

Data never sleeps
Menfragiadakis
Calcite | Level 5

Thank you both for your replies!

 

The reason that I need encryption is to make data unreadable to users (the data that we need to encrypt are phone numbers). So, in a dataset with several columns (Customer Name, Address, Several KPIs) there are also specific columns with phone numbers that we need to make unreadable. 

 

Currently, as I have aforementioned, I use hash functions (SHA256) and I was wondering if there is any similar function to do the encryption, and ideally the decipher, easily.

 

I have found several algorithms to do so, but I asked also here so as to have a second opinion and if there is something more specific!

Thanks in advance!

 

SASKiwi
PROC Star

This maybe a dumb question but why cant you just remove or mask the phone numbers from the columns rather than encrypting them?

Menfragiadakis
Calcite | Level 5
The reason that I cannot remove the phone numbers is that we need to join several tables on the specific fields. What do you mean by masking the phone numbers?
SASKiwi
PROC Star

By masking I mean you would replace the phone numbers with something like X's. If these fields are required for joining, why can't you do your joining first, then mask or remove afterwards?

 

Perhaps I'm not understanding your issues correctly - I just think there may be an easier way than encryption. 

 

 

LinusH
Tourmaline | Level 20

If you encrypt/encode your columns, you can store a translation table in a safe place.

Another option would be to use surrogate keys, but that requires a little more coding to get in place.

Yet another option is to metabound your library, and then create a view(s) pointing to the original tables. And the view can leave out the sensitive data.

Or, move to a data store that support columns level authorization, such as SPD Server or any(?) external RDBMS.

Data never sleeps
Ksharp
Super User

Here is a very simple example.

 

%let offset=1;

data _null_;
original='you think you know me';
put original=;

length encrypt $ 40;
do i=1 to lengthn(original);
 substr(encrypt,i,1)=byte(rank(char(original,i))+&offset);
end;
put encrypt=;

length decrypt $ 40;
do i=1 to lengthn(encrypt);
 substr(decrypt,i,1)=byte(rank(char(encrypt,i))-&offset);
end;
put decrypt=;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

Creating Custom Steps in SAS Studio

Check out this tutorial series to learn how to build your own steps in SAS Studio.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2280 views
  • 2 likes
  • 5 in conversation