BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Ronein
Meteorite | Level 14
Hello
I have a dataset which contains a field for identifying a customer ( cust_id).
I need to mask the cust_id (before I send it to other company).
Then the Other company will send me back the data set (With extra columns) and I need to create a new column with the original customer ID.
May anyone show code how to mask the ID column and then how to find the original customer ID value?
Some notes-
1- customer id is unique number for each row that cannot repeat.
2- The new masked customer id should also be unique value for each row
3- The target is to create a new column called masked_cust_id and then have the ability to calculate original cust_id from masked_cust_id

Data have;
input cust_ID;
cards;
33948398
54897733
98376333
;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
Data have;
input cust_ID $;

length encode decode $ 200;
encode=put(cust_ID,$hex64.);
decode=input(encode,$hex64.);

cards;
33948398
54897733
98376333
;
run;
proc print;run;

Ksharp_0-1689594299570.png

 

View solution in original post

22 REPLIES 22
Tom
Super User Tom
Super User

Just make a new dataset that has the two id's. 

The masked id can probably just be generated sequentially.

data masked_cust_id ;
  set have;
  by cust_id;
  if first.cust_id;
  masked_cust_id + 1 ;
  keep cust_id masked_cust_id;
run;

 

Then when you want to send out the actual data merge the two and drop the real id. 

data for_export ;
  merge masked_cust_id real_data(in=in2);
  by cust_id;
  if in2;
  drop cust_id;
run;

And when you get back data merge again and add back the real id.

data want ;
  merge masked_cust_id for_import(in=in2);
  by masked_cust_id;
  if in2;
run;

 

 

Ronein
Meteorite | Level 14

Sorry, you didnt create data sets-Real_Data   and  For_Import

Ronein
Meteorite | Level 14
Thank you for the reply. However I forgot to mention that let's assume that it is forbidden by the rules to create such backup data set that convert fake customer id to real customer id. I would like to find other solution that convert real customer id to fake customer id and then can convert it back from fake customer id to real customer id,thanks
Tom
Super User Tom
Super User

@Ronein wrote:
Thank you for the reply. However I forgot to mention that let's assume that it is forbidden by the rules to create such backup data set that convert fake customer id to real customer id. I would like to find other solution that convert real customer id to fake customer id and then can convert it back from fake customer id to real customer id,thanks

Then you are probably out of luck.  You would need to create some type of algorithm for encrypting the values that you need to be able to reverse.  That means someone could break it and convert the values back to the original ids.

 

It is probably safer to store the mapping in a secure location than to use a breakable algorithm.

Ronein
Meteorite | Level 14
May you show an example for such alogirthm in sas code?
SASKiwi
Opal | Level 21

Here is how we do it using the SAS MD5 function. This produces a unique and reproducable key that can't be (easily) unencrypted so you need to keep a table of the original and encrypted keys to match back to.

data want;
  Cust_ID = '12345678';
  Cust_ID_Encrypted = put(md5(Cust_ID),$hex16.);
  put _all_;
run;

 

Ronein
Meteorite | Level 14

Okay , however when I get back the data set then I want to have the ability to add back the real cust_ID.

How can I do it?

You didn't add this step here and it is critical step

Data tbl_have;
input cust_ID X Y Z;
cards;
33948398 10 20 30
54897733 11 12 13
98376333 40 60 80
;
run;

data want;
set tbl_have;
Cust_ID_Encrypted = put(md5(Cust_ID),$hex16.);
run;

 

 

SASKiwi
Opal | Level 21

@Ronein  - When they send back their data with Cust_ID_Encrypted, just match it back to the Want dataset which has both the encrypted and unencrypted customer IDs.  

Kurt_Bremser
Super User

You either need to keep

- a translation dataset

or

- the encryption/decryption code and the key

in a safe location. Both methods therefore provide the same level of security. Since a randomly created key mapping kept in a dataset is much easier to implement, use this method.

Ronein
Meteorite | Level 14

I see that SHA256 and MD5 are for  encryption without decryption.

I am looking for function that can do botyh encryption and decryption  (So when I reciever the data set back I will be able to add the original customer ID)

 

 

Kurt_Bremser
Super User

SAS does not provide a two-way encryption/decryption function like AES, so you would have to roll your own for an implementation of one such (reasonably secure) algorithm.

 

Using the RAND function and keeping a mapping dataset in a safe place (read: the location of the original data) works, is easy to implement, and does not violate data security/protection.

Ronein
Meteorite | Level 14

Thanks,

1- I suspect that using RAND function to generate fake customer ID  can lead to duplicate problem.

Please remember that Customer ID should be unique value for each customer .

Then, using sequence number as fake customer number can be better.

What do you think?

2-I understand that in SAS there is no built in function that provide a two-way encryption/decryption.

My question is maybe anyone created such algorithm via SAS code ?

 

 

 

 

Kurt_Bremser
Super User

Using a sequence weakens your encryption, as the original order becomes part of the encryption algorithm.

It also forces you to keep a mapping dataset, as the order may (will) change when new keys are inserted in your source data.

To prevent duplicates, you use the following method:

  • create the empty mapping dataset with two variables (original key, new key)
  • in the DATA step, read this dataset into two hash objects; one uses the original key as key, and the new key as data, the other uses the new key as key
  • for every observation, check if a new key is already defined in hash #1
  • if not found, create a new key with RAND in a DO UNTIL no match is found in hash #2; use the ADD method for both hashes
  • write to the output, omitting the original key
  • at the end of the step, save the contents of hash #1 back to the mapping dataset

When you receive data back, use the mapping dataset in a hash with new key as key and original key as data.

 

To your question #2: I seriously doubt anyone has gone to the length of creating a reliable two-way encryption method, and if, they'll want money for their work.

Ksharp
Super User
Data have;
input cust_ID $;

length encode decode $ 200;
encode=put(cust_ID,$hex64.);
decode=input(encode,$hex64.);

cards;
33948398
54897733
98376333
;
run;
proc print;run;

Ksharp_0-1689594299570.png

 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Get the $99 certification deal.jpg

 

 

Back in the Classroom!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 710 views
  • 11 likes
  • 8 in conversation