If you're not too fussy about how the masked values for the name string look like then below should work as well. This code uses md5() to create a 128Bit hash value, then applies a hex32 format to express these 128Bit in a 32 character string. A md5() is to a certain degree reversible so to fully mask the data a 22 character sub-string with random starting point is selected as the masked value. That should make it factually impossible to revert the masked string back to the original value. data af3; infile datalines truncover; input name $40.; datalines; Smith, John Doe, Jane, V. Jackson, Randy, G. Hanson, Therese Doe, Jane, V. ; run; data masked_name_lookup; /* set af3(keep=name);*/ /* define _masked_name with minumum length required to save memory when loading into hash table */ length _masked_name $22.; _masked_name=name; stop; run; data want(drop=_:); if _n_=1 then do; if 0 then set af3 masked_name_lookup; dcl hash Hname(dataset:'masked_name_lookup'); _rc=Hname.defineKey('name'); _rc=Hname.defineData(all:'y'); _rc=Hname.defineDone(); end; set af3 end=last; if Hname.find() ne 0 then do; _masked_name=substrn(put(md5(name),hex32.),ceil(ranuni(0)*10),22); _rc=Hname.add(); end; name=_masked_name; if last then Hname.output(dataset:'masked_name_lookup_tmp'); run; If you're using SAS9.4 then instead of md5() you could use sha256() which creates a 256Bit hash value and though makes it even more impossible to revert the masked value back to it's original. Below how the line of code would need to look like: _masked_name=substrn(put(sha256(name),hex64.),ceil(ranuni(0)*42),22);
... View more