BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20

@PGStats Exactly. Salting is a must, and there is never a reason not to.

hashman
Ammonite | Level 13

Hey guys @PGStats and @ChrisNZ:

The scenario under which I see the whole thing making the topic of the thread needed is:

  • I have a file with ID and DATA
  • I want to send the file to a client with the ID encrypted as AnonID, as I don't want the client to see the real IDs
  • After the file, processed in some way by the client, is sent back to me, I want to be able to pair AnonIDs I got back with the real IDs on my original file
  • I can do it by either (a) keeping the ID*AnonID xref or (b) not keeping it but instead regenerating AnonIDs from the IDs on the original file and matching them up with what I got back

Of course, adding a password to the ID before MD5-ing it costs nothing ... but pray tell me wise guys why under the above scenario I should do it? I own the file with the original real IDs. What the adding of the password will protect and from whom?

 

Kind regards

Paul D.       

PGStats
Opal | Level 21

Adding a password will make it difficult for your client to identify the real IDs when he/she has access to a superset of the IDs (i.e. your sampling frame). Take for example, a sample of students from a university campus. Without a password, it wouldn't be difficult for your client to feed all university student IDs to MD5 and identify the sampled students.

PG
hashman
Ammonite | Level 13

@PGStats:

Understood. Sure, this little (and basically free) precaution wouldn't hurt in such a case.

s_lassen
Meteorite | Level 14

Using a random number generator to get the anonymous IDs is one possibility. This is one of the few times I would use the deprecated RANUNI function, or rather the corresponding CALL routine. One of the reasons it is deprecated is that if you use it to generate random integers between 0 and 2^31-1, there will be no repeats of any values before you have gone through all the possible values. Which is not good in a random number generator, but ideal for your purpose - unless you have more than 2^31 (about 2 billion) subjects, but there aren't that many students in the world yet.

 

Here is an example:

data have;
  set sashelp.class;
  id=_N_;
run;

data anonymous(drop=name id) translate_table(keep=id NewId);;
  retain NewID 33; /* any positive integer less than 2^31 will do here */
  set have;
  call ranuni(NewID,_N_);
run;

I used the _N_ variable as the "output" in the CALL RANUNI routine, as we are not interested in the "real" output at all. It is the seed (NewID) we want, which is a pseudo-random integer, guaranteed not to repeat it self in a long time.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 19 replies
  • 6203 views
  • 20 likes
  • 8 in conversation