BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20

@PGStats Exactly. Salting is a must, and there is never a reason not to.

hashman
Ammonite | Level 13

Hey guys @PGStats and @ChrisNZ:

The scenario under which I see the whole thing making the topic of the thread needed is:

  • I have a file with ID and DATA
  • I want to send the file to a client with the ID encrypted as AnonID, as I don't want the client to see the real IDs
  • After the file, processed in some way by the client, is sent back to me, I want to be able to pair AnonIDs I got back with the real IDs on my original file
  • I can do it by either (a) keeping the ID*AnonID xref or (b) not keeping it but instead regenerating AnonIDs from the IDs on the original file and matching them up with what I got back

Of course, adding a password to the ID before MD5-ing it costs nothing ... but pray tell me wise guys why under the above scenario I should do it? I own the file with the original real IDs. What the adding of the password will protect and from whom?

 

Kind regards

Paul D.       

PGStats
Opal | Level 21

Adding a password will make it difficult for your client to identify the real IDs when he/she has access to a superset of the IDs (i.e. your sampling frame). Take for example, a sample of students from a university campus. Without a password, it wouldn't be difficult for your client to feed all university student IDs to MD5 and identify the sampled students.

PG
hashman
Ammonite | Level 13

@PGStats:

Understood. Sure, this little (and basically free) precaution wouldn't hurt in such a case.

s_lassen
Meteorite | Level 14

Using a random number generator to get the anonymous IDs is one possibility. This is one of the few times I would use the deprecated RANUNI function, or rather the corresponding CALL routine. One of the reasons it is deprecated is that if you use it to generate random integers between 0 and 2^31-1, there will be no repeats of any values before you have gone through all the possible values. Which is not good in a random number generator, but ideal for your purpose - unless you have more than 2^31 (about 2 billion) subjects, but there aren't that many students in the world yet.

 

Here is an example:

data have;
  set sashelp.class;
  id=_N_;
run;

data anonymous(drop=name id) translate_table(keep=id NewId);;
  retain NewID 33; /* any positive integer less than 2^31 will do here */
  set have;
  call ranuni(NewID,_N_);
run;

I used the _N_ variable as the "output" in the CALL RANUNI routine, as we are not interested in the "real" output at all. It is the seed (NewID) we want, which is a pseudo-random integer, guaranteed not to repeat it self in a long time.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 9432 views
  • 20 likes
  • 8 in conversation