BookmarkSubscribeRSS Feed
ChrisNZ
Tourmaline | Level 20

@PGStats Exactly. Salting is a must, and there is never a reason not to.

hashman
Ammonite | Level 13

Hey guys @PGStats and @ChrisNZ:

The scenario under which I see the whole thing making the topic of the thread needed is:

  • I have a file with ID and DATA
  • I want to send the file to a client with the ID encrypted as AnonID, as I don't want the client to see the real IDs
  • After the file, processed in some way by the client, is sent back to me, I want to be able to pair AnonIDs I got back with the real IDs on my original file
  • I can do it by either (a) keeping the ID*AnonID xref or (b) not keeping it but instead regenerating AnonIDs from the IDs on the original file and matching them up with what I got back

Of course, adding a password to the ID before MD5-ing it costs nothing ... but pray tell me wise guys why under the above scenario I should do it? I own the file with the original real IDs. What the adding of the password will protect and from whom?

 

Kind regards

Paul D.       

PGStats
Opal | Level 21

Adding a password will make it difficult for your client to identify the real IDs when he/she has access to a superset of the IDs (i.e. your sampling frame). Take for example, a sample of students from a university campus. Without a password, it wouldn't be difficult for your client to feed all university student IDs to MD5 and identify the sampled students.

PG
hashman
Ammonite | Level 13

@PGStats:

Understood. Sure, this little (and basically free) precaution wouldn't hurt in such a case.

s_lassen
Meteorite | Level 14

Using a random number generator to get the anonymous IDs is one possibility. This is one of the few times I would use the deprecated RANUNI function, or rather the corresponding CALL routine. One of the reasons it is deprecated is that if you use it to generate random integers between 0 and 2^31-1, there will be no repeats of any values before you have gone through all the possible values. Which is not good in a random number generator, but ideal for your purpose - unless you have more than 2^31 (about 2 billion) subjects, but there aren't that many students in the world yet.

 

Here is an example:

data have;
  set sashelp.class;
  id=_N_;
run;

data anonymous(drop=name id) translate_table(keep=id NewId);;
  retain NewID 33; /* any positive integer less than 2^31 will do here */
  set have;
  call ranuni(NewID,_N_);
run;

I used the _N_ variable as the "output" in the CALL RANUNI routine, as we are not interested in the "real" output at all. It is the seed (NewID) we want, which is a pseudo-random integer, guaranteed not to repeat it self in a long time.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 5951 views
  • 20 likes
  • 8 in conversation