@PGStats Exactly. Salting is a must, and there is never a reason not to.
Hey guys @PGStats and @ChrisNZ:
The scenario under which I see the whole thing making the topic of the thread needed is:
Of course, adding a password to the ID before MD5-ing it costs nothing ... but pray tell me wise guys why under the above scenario I should do it? I own the file with the original real IDs. What the adding of the password will protect and from whom?
Kind regards
Paul D.
Adding a password will make it difficult for your client to identify the real IDs when he/she has access to a superset of the IDs (i.e. your sampling frame). Take for example, a sample of students from a university campus. Without a password, it wouldn't be difficult for your client to feed all university student IDs to MD5 and identify the sampled students.
Understood. Sure, this little (and basically free) precaution wouldn't hurt in such a case.
Using a random number generator to get the anonymous IDs is one possibility. This is one of the few times I would use the deprecated RANUNI function, or rather the corresponding CALL routine. One of the reasons it is deprecated is that if you use it to generate random integers between 0 and 2^31-1, there will be no repeats of any values before you have gone through all the possible values. Which is not good in a random number generator, but ideal for your purpose - unless you have more than 2^31 (about 2 billion) subjects, but there aren't that many students in the world yet.
Here is an example:
data have;
set sashelp.class;
id=_N_;
run;
data anonymous(drop=name id) translate_table(keep=id NewId);;
retain NewID 33; /* any positive integer less than 2^31 will do here */
set have;
call ranuni(NewID,_N_);
run;
I used the _N_ variable as the "output" in the CALL RANUNI routine, as we are not interested in the "real" output at all. It is the seed (NewID) we want, which is a pseudo-random integer, guaranteed not to repeat it self in a long time.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.