BookmarkSubscribeRSS Feed
ari
Quartz | Level 8 ari
Quartz | Level 8

Is there a way to create anonymized ID in SAS?   which can also account for the new ID's added later.

for example:

Original data:

ID     Anonimsed ID

1         204

2         206

3         208

 

New data:

ID     Anonimsed ID

1         204

2         206

3         208

4         201

5         203

 

So retain the anonymized IDs created for the original ID's and only anonymise the newly added ID's.

 

 

 

8 REPLIES 8
Shmuel
Garnet | Level 18

There are some point to clarify:

- Is ID numeric or char type?

- For any N observations there may be an infinite number of new possible IDs

  which means you must define:

  (1) A range to look for IDs

  (2) Define the rules to create anonymized IDs - either sequential or random
       and the amount of new IDs to add.

 

I don't know if there is any general tool to do it, but

you can create a dataset with the full range and pick out of it the new IDs which do not

exist in your original table.  

LinusH
Tourmaline | Level 20

If you need to revert back to original id, you need a look-up table that contains the translation.

Using such table you can use a surrogate key/seq no approach.

 

A more simple approach is to hash the key, potentially using a secret salt (a constant that you concatenate before hashing).

Hashes are almost impossible to revert.

There is a theoratical possibility to get duplicates, but the risk extremly low.

How to go about this depends on you requirements of your anonymization.

Data never sleeps
Kurt_Bremser
Super User

See here how to use hash objects to read and maintain lookup tables:

data old;
input id $;
datalines;
1
2
3
;

data new;
input id $;
datalines;
1
2
3
4
5
;

/* set up the original lookup table */

data enc;
set old end=done;
length anon $3;
if _n_ = 1
then do;
  declare hash l (ordered:"a");
  l.definekey("id");
  l.definedata("id","anon");
  l.definedone();
  declare hash k ();
  k.definekey("anon");
  k.definedone();
end;
anon = put(rand("integer",200,999),z3.);
do while (k.check() = 0);
  anon = put(rand("integer",200,999),z3.);
end;
rc = l.add();
rc = k.add();
if done then l.output(dataset:"lookup");
drop rc;
run;

/* encode a new table, and update the lookup table */
data enc_new;
set new end=done;
length anon $3;
if _n_ = 1
then do;
  declare hash l (dataset:"lookup",ordered:"a");
  l.definekey("id");
  l.definedata("id","anon");
  l.definedone();
  declare hash k (dataset:"lookup (keep=anon)");
  k.definekey("anon");
  k.definedone();
end;
if l.find() ne 0
then do;
  anon = put(rand("integer",200,999),z3.);
  do while (k.check() = 0);
    anon = put(rand("integer",200,999),z3.);
  end;
  rc = l.add();
  rc = k.add();
end;
if done then l.output(dataset:"lookup");
drop rc;
run;
sasuser_sk
Quartz | Level 8

Hello KurtBremser:

I used your code to generate an anonymized ID for my customer ID. Every time I run your code it generates a different new anon ID for my customer ID. I am sending data to third party and want to anonymize my ID and when they send data back to me (with additional info) I would want to join it back to my ID. So I want to have a fixed anon ID for my ID (like fixed seed). Is your code doing the logic that I'm wanting to build? If not, Is there any other way I can successfully anonymize my ID and join it back after receiving data from third party. Thank you for your help. I am still learning to read and understand your code as being a beginner. 

 

I just changed set "My Data File" and changed anon to "Customer_ID".

Kurt_Bremser
Super User

Note that the first data step will always create a new lookup table, so it must be run only once.

After the lookup table is created, you must always use the second step for updating.

sasuser_sk
Quartz | Level 8

And the second data step would be to assign anon id to additional ids over time?

Kurt_Bremser
Super User
Yes. It first loads the lookup table into the hash, so it can use existing anonymizations, and writes it back out after it has been updated during the step.
sasuser_sk
Quartz | Level 8

Thank you! Very helpful.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1692 views
  • 0 likes
  • 5 in conversation