Just make a new dataset that has the two id's.
The masked id can probably just be generated sequentially.
data masked_cust_id ; set have; by cust_id; if first.cust_id; masked_cust_id + 1 ; keep cust_id masked_cust_id; run;
Then when you want to send out the actual data merge the two and drop the real id.
data for_export ; merge masked_cust_id real_data(in=in2); by cust_id; if in2; drop cust_id; run;
And when you get back data merge again and add back the real id.
data want ; merge masked_cust_id for_import(in=in2); by masked_cust_id; if in2; run;
Thank you for the reply. However I forgot to mention that let's assume that it is forbidden by the rules to create such backup data set that convert fake customer id to real customer id. I would like to find other solution that convert real customer id to fake customer id and then can convert it back from fake customer id to real customer id,thanks
Then you are probably out of luck. You would need to create some type of algorithm for encrypting the values that you need to be able to reverse. That means someone could break it and convert the values back to the original ids.
It is probably safer to store the mapping in a secure location than to use a breakable algorithm.
Here is how we do it using the SAS MD5 function. This produces a unique and reproducable key that can't be (easily) unencrypted so you need to keep a table of the original and encrypted keys to match back to.
data want; Cust_ID = '12345678'; Cust_ID_Encrypted = put(md5(Cust_ID),$hex16.); put _all_; run;
Okay , however when I get back the data set then I want to have the ability to add back the real cust_ID.
How can I do it?
You didn't add this step here and it is critical step
Data tbl_have; input cust_ID X Y Z; cards; 33948398 10 20 30 54897733 11 12 13 98376333 40 60 80 ; run; data want; set tbl_have; Cust_ID_Encrypted = put(md5(Cust_ID),$hex16.); run;
You either need to keep
- a translation dataset
- the encryption/decryption code and the key
in a safe location. Both methods therefore provide the same level of security. Since a randomly created key mapping kept in a dataset is much easier to implement, use this method.
SAS does not provide a two-way encryption/decryption function like AES, so you would have to roll your own for an implementation of one such (reasonably secure) algorithm.
Using the RAND function and keeping a mapping dataset in a safe place (read: the location of the original data) works, is easy to implement, and does not violate data security/protection.
1- I suspect that using RAND function to generate fake customer ID can lead to duplicate problem.
Please remember that Customer ID should be unique value for each customer .
Then, using sequence number as fake customer number can be better.
What do you think?
2-I understand that in SAS there is no built in function that provide a two-way encryption/decryption.
My question is maybe anyone created such algorithm via SAS code ?
Using a sequence weakens your encryption, as the original order becomes part of the encryption algorithm.
It also forces you to keep a mapping dataset, as the order may (will) change when new keys are inserted in your source data.
To prevent duplicates, you use the following method:
When you receive data back, use the mapping dataset in a hash with new key as key and original key as data.
To your question #2: I seriously doubt anyone has gone to the length of creating a reliable two-way encryption method, and if, they'll want money for their work.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.