BookmarkSubscribeRSS Feed
Sheeba
Lapis Lazuli | Level 10

Hi,

 

I am trying to use sas to do the data masking from a similar dataset

 

Name                          ID

John Til                       12345
Mike Sam                    12346
Henry Ken                   12348

Robert pat                    12350

 

I am trying to mask the variable name. I am able to use translate and substr functions and convert this to an encrypted value . the encrypted value is random string of characters.

 

Is it possible to use any function to encrypt it and  make it a meaningful name? is it possible to do it if we have a table /dataset with different values for names ?

 

Name                       ID

Jack                       12345
Brian                     12346
Jill                  12348

Tom               12350

 

Regards,

Sheeba

8 REPLIES 8
SASKiwi
PROC Star

I don't follow why you want to substitute real names for fictitious ones. What do you hope to achieve by doing this? Your names are already identified by your ID variable. If you don't want your data to contain real names then why not just drop the name variable? 

Sheeba
Lapis Lazuli | Level 10

Hi SasKiwi,

 

Thanks for the response.

 

We have some datasets which store confidential which is to be accessed only by certain roles. Right now we have different tables for different users. But we are trying to see if this is a possible option

 

Regards,

Sheeba

SASKiwi
PROC Star

@Sheeba  - You should look at views on the same table. Create a view excluding customer names and set permissions so that users can only see the view.

Sheeba
Lapis Lazuli | Level 10

Hi SASKiwi,

 

Sure..I will definitely look for that option....

 

Also just for my knowledge sake, do you know any encryption algorithms/techniques within sas which can be used to encrypt variable names and replace it with a meaningful entity (like name with fake name, ID with a fake ID)  by reading from table or from an external db?

 

Appreciate your response.

 

Regards,

Sheeba

SASKiwi
PROC Star

@Sheeba  - You can use hashing techniques to transform character strings into shortened  and untranslatable version of the original: https://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=n05ptq6zr5amxkn18mjkyvbkjjos.htm...

 

This is useful in obfuscating sensitive data like bank account numbers and credit card numbers. The hashes still act as consistent data keys but can't be easily "de-hashed".

 

Encryption is more usually done at a higher level like across a table, a folder, or storage device.

 

Personally I don't see the point of faking data, when techniques like hashing are so much easier.

Sheeba
Lapis Lazuli | Level 10

Hi SASKiwi,

 

Thanks for the details...I will study more about hashing and ways to de- hash it .  the fact that it can't be easily de-hashed makes it stronger compared to the encryption using translate functions, 1 byte to 2 byte swap.

 

yes, faking the data doesn't seem to be a good option compared to view creation and Hashing.

 

Thanks a lot for the details. I will study about this.

Regards,

Sheeba

Kurt_Bremser
Super User

If you need to consistently replace an item with another "readable" item, I would do the following:

  • get (or create) a very large list of items that fit (e.g. names)
  • shuffle that randomly, and add an empty variable for your original values
  • use a hash object to store it in a data step
  • do a lookup to see if you already have a match
  • if not, retrieve the next empty one, and set your original value into the hash
  • at the end of the step output the hash back into the lookup table, so you can reuse it for the next time

Example code looks like this:

data have;
input Name &$10. ID $;
datalines;
John Til    12345
Mike Sam    12346
Henry Ken   12348
Robert pat  12350
;

data lookup;
input newname $10.;
length name $10;
datalines;
Curt
David
Max
Sally
Jill
Tom
Brian
Jack
;

data want;
set have end=done;
if _n_ = 1
then do;
  length newname $10;
  /* create hash with already eisting translations */
  declare hash lookup1(dataset:"lookup (where=(name ne ''))");
  lookup1.definekey('name');
  lookup1.definedata('name','newname');
  lookup1.definedone();
  /* create hash with unused translations */
  declare hash lookup2(dataset:"lookup (where=(name = ''))",multidata:'yes');
  lookup2.definekey('name');
  lookup2.definedata('name','newname');
  lookup2.definedone();
  call missing(newname); * avoids "uninitialized" NOTE;
end;
if lookup1.find() = 0
then name = newname; * already existing translation;
else do;
  _name = name; * necessary, because the following statement will overwrite name with a missing value;
  rc = lookup2.find(key:''); * find next empty;
  rc = lookup2.removedup(); * remove from hash;
  name = _name;
  rc = lookup1.add(); * add to existing translations;
  name = newname; * set name for output;
end;
if done
then do;
  /* write both lookups out and concatenate */
  lookup1.output(dataset:'lookup1');
  lookup2.output(dataset:'lookup2');
  call execute('data lookup; set lookup1 lookup2; run;');
end;
drop rc _name newname;
run;

proc print data=want noobs;
run;

proc print data=lookup noobs;
run;

Results:

(want)

Name      ID

Curt     12345
David    12346
Max      12348
Sally    12350

(lookup)

   Name       newname

Mike Sam       David 
Henry Ken      Max   
John Til       Curt  
Robert pat     Sally 
               Jill  
               Tom   
               Brian 
               Jack  

 

Sheeba
Lapis Lazuli | Level 10

Hi KurtBremser,

 

Thank you so much for the details.

 

I will try this approach.

 

Thanks again,

Regards,

Sheeba

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 6389 views
  • 4 likes
  • 3 in conversation