I`ve written my own anonymizer code, it works but I'm sure that in sas viya there must be an option or function for this purpose.
data casuser.test;
input codobjet $;
infile cards;
cards;
1372G2
1382NX
190901
1T33AX
210102
260404
2CAA94
2CBT72
2CJT94
2EH1K5
;
run;
proc iml;
call randseed(123);
xx= randfun({4,6}, "integer", -20, 20);
print xx;
vary=repeat("a",4)||char(t(1:4));
vary2=catx("", vary[,1], vary[,2]);
do i=1 to 4;
temp=j(1,1, ' ');
do j=1 to 6;
temp=catx(" ", temp, char(xx[i,j]));
end;
mac=choosec(i, "a1", "a2", "a3", "a4");
call symputx(mac, temp);
end;
quit;
%put &a3.;
data casuser.test2;
set casuser.test(obs=10);
array roll (6) _temporary_ (&a3.);
format chiffre inv_chiffre $12.;
do i=1 to length(codobjet);
chiffre=compress(catx("", chiffre, byte(rank(substr(codobjet, i, 1))+roll(i))));
end;
do j=1 to length(codobjet);
inv_chiffre=compress(catx("", inv_chiffre, byte(rank(substr(chiffre, j, 1))-roll(j))));
end;
keep codobjet chiffre inv_chiffre;
run;
Here is how we anonymize real data keys:
ID_Anon = put(md5(cats('ID_ANON',ID),$hex10.);
You can't decrypt this though so you need to keep a table of the real and anonymized keys. BTW the anonymized key is repeatable and unique so can be used for table joins etc.
Hello,
Are you trying to do anonymisation or pseudonymisation?
I think you want to do pseudonymisation (I haven't studied your code though) which often comes down to "string replacement".
You can use the SHA256 algorithm to replace your identifiers (or other information) to unreadable-by-human 256-bit hash-values. SAS has a SHA256 Function!
If you want something human-readable (sometimes that is easier for testing and debugging) you can replace person / subject / object names by city names or by a combination of two words in a list of 100 names / flowers / rivers / colors / seas / mountains / first-names etc.
Good luck,
Koen
Here is how we anonymize real data keys:
ID_Anon = put(md5(cats('ID_ANON',ID),$hex10.);
You can't decrypt this though so you need to keep a table of the real and anonymized keys. BTW the anonymized key is repeatable and unique so can be used for table joins etc.
Isn't trimming to 5 characters (hex10.) out of 16 going to create collisions?
@ChrisNZ - You are probably right. I was basing my example on a short key so should have considered the impact of longer ones.
@SASKiwi wrote:
@ChrisNZ - You are probably right. I was basing my example on a short key so should have considered the impact of longer ones.
@ChrisNZ The length of the key is of no relevance. It's just the number of rows / distinct source strings. I've even seen once in reality a collision happening with an "untruncated" md5() hash key which makes me now always consider using a sha256 instead of a md5 as soon as row numbers go into the millions.
options ps=max;
data collisions;
length id other_id 8 id_anon $10;
dcl hash h1();
h1.defineKey('ID_Anon');
h1.defineData('other_id');
h1.defineDone();
do id=1 to 10**7;
ID_Anon = put(md5(cats('ID_ANON',ID)),$hex10.);
if h1.check()=0 then
do;
rc=h1.find();
output;
keep id other_id id_anon;
/* leave;*/
end;
else
do;
other_id=id;
rc=h1.add();
end;
end;
run;
proc print data=collisions;
run;
Collisions if truncating to $hex10.
Agree with @ChrisNZ
It should be $hex32. for a 128bit hash key. Any truncation will increase the collision risk which due to the birthday paradigm is always much higher than one would intuitively assume.
@acordes I guess the Viya version will be very relevant. Below two links I found which might give you some ideas.
https://www.youtube.com/watch?v=E6yVxbitC2k
If it's only about masking values in reports: https://blogs.sas.com/content/sgf/2018/03/02/is-it-sensitive-mask-it-with-data-suppression/
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.