BookmarkSubscribeRSS Feed
Menfragiadakis
Calcite | Level 5

Hello!

 

I would like to ask which is the best technique to encrypt observations in a dataset. For example, currently I am using SHA256 to hash the variable Model in the dataset Cars. However I cannot reverse the Model after it is hashed.

 

Is there any other efficient way to do the encryption on the observations?

 

 CarsDataset.png

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, from my understanding, MD5/SHA are almost like signatures - there are a formula there to ensure the data is the same, be it after tranporting or something like that.  It is not an encryption device.

Now it depends on what your trying to achieve here.  If you just want to make the data unreadable, a simple formula can do that, then only the person knowing the formula can decipher it.  Even something simple like:

data decodes;
  input letter $ code;
datalines;
A 6
B 2
C 8
D 13
E 26
F 1
G 5
;
run;

data _null_;
  set decodes end=last;
  if _n_=1 then call execute('data want; set sashelp.cars;');
  call execute(cats('make=tranwrd(make,"',letter,'","',put(code,best.),'");'));
  if last then call execute('run;');
run;

Can encrypt a column and only the person who had the decodes dataset would be able to read it - ok, its probably a simple problem to run an algorithm over that particular version to decode it, but this is just an example, you could have a whole matrix of coding.

LinusH
Tourmaline | Level 20

Actually, MD5/SHA256 does encrypt as  well, but it isn't super secure.

But you say "observation". Do you mean all observation, whole observations? Or just specific variables/columns?

And you can encrypt whole data sets if you wish, and it can be encrypted by the user/programs that knows the key.

Other options including creating your own algorithms, like

http://support.sas.com/resources/papers/proceedings10/108-2010.pdf

Or take look at SAS Federation Server, which has standard functionality for data masking.

Data never sleeps
Menfragiadakis
Calcite | Level 5

Thank you both for your replies!

 

The reason that I need encryption is to make data unreadable to users (the data that we need to encrypt are phone numbers). So, in a dataset with several columns (Customer Name, Address, Several KPIs) there are also specific columns with phone numbers that we need to make unreadable. 

 

Currently, as I have aforementioned, I use hash functions (SHA256) and I was wondering if there is any similar function to do the encryption, and ideally the decipher, easily.

 

I have found several algorithms to do so, but I asked also here so as to have a second opinion and if there is something more specific!

Thanks in advance!

 

SASKiwi
PROC Star

This maybe a dumb question but why cant you just remove or mask the phone numbers from the columns rather than encrypting them?

Menfragiadakis
Calcite | Level 5
The reason that I cannot remove the phone numbers is that we need to join several tables on the specific fields. What do you mean by masking the phone numbers?
SASKiwi
PROC Star

By masking I mean you would replace the phone numbers with something like X's. If these fields are required for joining, why can't you do your joining first, then mask or remove afterwards?

 

Perhaps I'm not understanding your issues correctly - I just think there may be an easier way than encryption. 

 

 

LinusH
Tourmaline | Level 20

If you encrypt/encode your columns, you can store a translation table in a safe place.

Another option would be to use surrogate keys, but that requires a little more coding to get in place.

Yet another option is to metabound your library, and then create a view(s) pointing to the original tables. And the view can leave out the sensitive data.

Or, move to a data store that support columns level authorization, such as SPD Server or any(?) external RDBMS.

Data never sleeps
Ksharp
Super User

Here is a very simple example.

 

%let offset=1;

data _null_;
original='you think you know me';
put original=;

length encrypt $ 40;
do i=1 to lengthn(original);
 substr(encrypt,i,1)=byte(rank(char(original,i))+&offset);
end;
put encrypt=;

length decrypt $ 40;
do i=1 to lengthn(encrypt);
 substr(decrypt,i,1)=byte(rank(char(encrypt,i))-&offset);
end;
put decrypt=;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1035 views
  • 2 likes
  • 5 in conversation