<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pseudonymization of sensitive data in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911058#M359257</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;You could just remove the extraneous information prior to creating the digest value as per below sample code.&lt;/P&gt;
&lt;P&gt;md5 creates a 128 bit (16 byte) digest value that you can express as a 32 character hex string, sha256 creates a 256 bit digest value that then needs a 64 character hex string.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data demo;
  input id $20.;

  length id_md5_hex $32. id_sha256_hex $64.;
  id_md5_hex   =hashing('md5',strip(scan(ID,1,'_')));
  id_sha256_hex=hashing('sha256',strip(scan(ID,1,'_')));

  length id_md5_hex_2 $34. id_sha256_hex_2 $68.;
  id_md5_hex_2   =catx('_',hashing('md5',strip(scan(ID,1,'_'))),scan(ID,2,'_'));
  id_sha256_hex_2=catx('_',hashing('sha256',strip(scan(ID,1,'_'))),scan(ID,2,'_'));

  datalines;
12345
12345_c
;

proc print data=demo;
  var id_md5_hex: id_sha256_hex:;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1704851925326.png" style="width: 1404px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92414i95322EE124C68BC0/image-dimensions/1404x75?v=v2" width="1404" height="75" role="button" title="Patrick_0-1704851925326.png" alt="Patrick_0-1704851925326.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please note that if someone figures out how you created the digest values then it's with today's compute power no more that hard to reverse the process. Below sample code to illustrate this.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data digest_lookup(compress=yes);
  length id_md5_hex $32.; 
  do i=1 to 1000000;
    id=put(i,20. -l);
    id_md5_hex =hashing('md5',strip(scan(ID,1,'_')));
    output;
  end;
  drop i;
run;

data unmasked;
  set digest_lookup;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'demo');
      h1.defineKey('id_md5_hex');
      h1.defineData('id');
      h1.defineDone();
    end;
  if h1.find()=0 then output;
run;
proc print data=unmasked;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_1-1704851733817.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92413i5CF213DD7303E7DB/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Patrick_1-1704851733817.png" alt="Patrick_1-1704851733817.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Because of the above I would at least also "scramble" the hex string. Some simple algorithm as below would already make it much much harder to just guess what you've done and use a simple generated lookup table.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc format;
  invalue hex2dec
    '0'=0
    '1'=1
    '2'=2
    '3'=3
    '4'=4
    '5'=5
    '6'=6
    '7'=7
    '8'=8
    '9'=9
    'A'=10
    'B'=11
    'C'=12
    'D'=13
    'E'=14
    'F'=15
    ;
data demo;
  input id $20.;

  length id_md5_hex id_md5_hex_scrambled $32.;
  id_md5_hex   =hashing('md5',strip(scan(ID,1,'_')));

  l=input(substr(id_md5_hex,12,1),hex2dec.)+6;
  id_md5_hex_scrambled=cats(substr(id_md5_hex,l+1),substr(id_md5_hex,1,l));
  drop l;
  datalines;
12345
12345_c
;

proc print data=demo;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1704853882857.png" style="width: 712px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92415i37E7256DF6F9FFE5/image-dimensions/712x95?v=v2" width="712" height="95" role="button" title="Patrick_0-1704853882857.png" alt="Patrick_0-1704853882857.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 10 Jan 2024 02:38:25 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2024-01-10T02:38:25Z</dc:date>
    <item>
      <title>Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910482#M359055</link>
      <description>&lt;P&gt;I have some tables of data which I need to pseudonymise. The main table has the real name of patients in addition to their ids which is used to identify them. Since a patients can have one or more diesease, there are several records per patient ID.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There are also several sub tables containing different examinations&amp;nbsp; and findings of patient like lab values etc, here too there are many records per patient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The aim is to create a pseudonym for each patient and also clearly identify which record in the sub table belongs to which patient.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any idea how to approach this problem, probably with an example&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 16:00:04 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910482#M359055</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-01-04T16:00:04Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910483#M359056</link>
      <description>&lt;P&gt;Why use the "name" at all if you have a patient ID number? Us the ID for everything except billing or wherever appropriate to use real names.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have multiple users for these tables that you don't want to know that information then perhaps create views for them to use that do not have the actual name variables at all. Or isolate the names elsewhere and remove from the data.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 16:04:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910483#M359056</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-01-04T16:04:53Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910494#M359063</link>
      <description>&lt;P&gt;You could &lt;A href="https://blogs.sas.com/content/sasdummy/2014/01/18/sha256-function-sas94/" target="_self"&gt;replace the name with a one-way hash of the name value&lt;/A&gt;. Example:&lt;/P&gt;
&lt;LI-CODE lang="sas"&gt;data hash_class;
 length hashname $32;
 format hashname $hex64.;
 set sashelp.class;
 hashname = sha256(name);
 drop name;
run;&lt;/LI-CODE&gt;
&lt;P&gt;The hash values will be unique (per name value) but you won't be able to derive the original name from the hash. Make it even more "secure"' by adding a salt value (which would be a secret value that would make it difficult for someone to predict the hash value from an original name).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ChrisHemedinger_0-1704385883400.png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92273iBE178D4F53C95178/image-size/large?v=v2&amp;amp;px=999" role="button" title="ChrisHemedinger_0-1704385883400.png" alt="ChrisHemedinger_0-1704385883400.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Note that a patient ID is also PII within your system (I'd guess), so anonymizing the name may not be enough for the purpose. Anyone with access to a patient database could figure out which patient is which. If the goal is to create a report that summarizes characteristics of a patient population&amp;nbsp;&lt;STRONG&gt;without&lt;/STRONG&gt; needing to back and identify specific patients, consider building a hash of the name and ID (which would yield a unique identifier for the patient that could not be traced back directly to a person) or maybe just hash the ID and drop the name.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 16:36:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910494#M359063</guid>
      <dc:creator>ChrisHemedinger</dc:creator>
      <dc:date>2024-01-04T16:36:15Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910499#M359068</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4"&gt;@ChrisHemedinger&lt;/a&gt;&amp;nbsp; &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13884"&gt;@ballardw&lt;/a&gt;&amp;nbsp;Thanks for that, am thinking of creating the pseudonym from the name and date of birth and/or id. But I will like to be able to decrypt this value again. So how do I add this salt value?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will not be having multiple users in this case&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 17:13:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910499#M359068</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-01-04T17:13:21Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910522#M359074</link>
      <description>&lt;P&gt;You can't decrypt hashed values - that's the whole point of hash techniques, that they are virtually impossible to reverse. You need to keep the original values as well as the hashed versions to be able to join them back in later if required.&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 20:06:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910522#M359074</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-01-04T20:06:09Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910548#M359078</link>
      <description>&lt;P&gt;There was a similar question in the community recently which I couldn't locate. I believe the accepted solution was encoding the patient id&amp;nbsp; using $hex64. format , and decoding it back the same way.&amp;nbsp; This might be helpful in your case as well. Below is a replicated code:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;Data have;
input id $;
cards;
100011
100012
100013
;
&lt;BR /&gt;data want; 
	set have; 
	length encoded_id decoded_id $ 200;
	encoded_id=put(id,$hex64.);
	decoded_id=input(encoded_id,$hex64.);
proc print; run; &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Capture.PNG" style="width: 358px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92285i4F4FCABF1E9F7BAF/image-size/large?v=v2&amp;amp;px=999" role="button" title="Capture.PNG" alt="Capture.PNG" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 22:50:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910548#M359078</guid>
      <dc:creator>A_Kh</dc:creator>
      <dc:date>2024-01-04T22:50:49Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910551#M359079</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Have a look at this paper &lt;A href="https://support.sas.com/resources/papers/proceedings16/2500-2016-poster.pdf" target="_blank"&gt;https://support.sas.com/resources/papers/proceedings16/2500-2016-poster.pdf&lt;/A&gt;&amp;nbsp;by my friend&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/166908"&gt;@AndySmith&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;His presentation tackled/addressed similar issue back in 2016.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps,&lt;/P&gt;
&lt;P&gt;Ahmed&lt;/P&gt;</description>
      <pubDate>Thu, 04 Jan 2024 23:45:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910551#M359079</guid>
      <dc:creator>AhmedAl_Attar</dc:creator>
      <dc:date>2024-01-04T23:45:52Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910568#M359084</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's a common scenario that one needs to mask PII columns for a Dev and TEST environment while maintaining a way for nominated people like testers to "de-mask" values so they can refer back to prod source systems for investigation.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Another requirement is often for the masked values to fit into the existing table structures - especially column type and length - which makes using SHA and similar approaches not suitable.&lt;/P&gt;
&lt;P&gt;And the same cleartext value should always map into the same masked value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What worked for me in the past:&lt;BR /&gt;1.&amp;nbsp;Maintain a permanent lookup table per PII column in a protected location with two columns: cleartext_value, masked_value&lt;BR /&gt;2. The masked value is just a sequence number. Whenever there is a new cleartext value just create a new value pair with max sequence number plus one for the masked value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below sample code for masking a character variable to illustrate the approach:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options dlcreatedir;
libname secure "%sysfunc(pathname(work))\secured_folder";

%if not %sysfunc(exist(secure.name)) %then
  %do;
    data secure.name(index=(name/unique __name_masked/unique));
      stop;
      set sashelp.class(keep=name);
      __name_masked=name;
    run;
  %end;

data work.class_clear;
  set sashelp.class;
  output;
  if _n_=3 then output;
run;

data work.class_masked(drop=__:);
  if _n_=1 then
    do;
      if 0 then set sashelp.class secure.name;
      dcl hash h1(dataset:'secure.name');
      h1.defineKey('name');
      h1.defineData('name', '__name_masked');
      h1.defineDone();
      
    end;
  call missing(of _all_);

  set work.class_clear end=__last;
  if h1.find() ne 0 then 
    do;
      __name_masked=put(h1.num_items +1,f16. -l);
      __rc=h1.add();
    end;
  name=__name_masked;
  if __last then h1.output(dataset:'work.__name');
run;

proc datasets lib=work nolist nowarn;
  append base=secure.name data=work.__name nowarn;
  run;
  delete __name;
  run;
quit;

proc print data=work.class_masked;
run;

proc print data=secure.name;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jan 2024 03:15:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910568#M359084</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2024-01-05T03:15:39Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910596#M359096</link>
      <description>&lt;P&gt;Thankyou all for all the suggestions and examples. I will try them to find out which will work for me.&lt;/P&gt;
&lt;P&gt;I will then give a feedback&lt;/P&gt;</description>
      <pubDate>Fri, 05 Jan 2024 12:34:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910596#M359096</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-01-05T12:34:47Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910731#M359135</link>
      <description>&lt;P&gt;Just did something like that. The solution I used went more or less like this:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table anonymization_id (id num,anon_id num);
  create unique index id on anonymization_id(id);
quit;

data patient1_anon(drop=id) anonymization_id;
  set patient1;
  modify anonymization_id key=id/unique nobs=n_id;
  if _iorc_ then do; /* assuming that the ID was not found */
    n_id+1;
    anon_id=n_id;
    output  anonymization_id;
    _error_=0;
    end;
  output patient1_anon;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;To anonymize the next patient table, just use a similar datastep (but keeping the same anonymization_id table), and you will have one table with all the anonymizations used, and anonymized versions of the patient data tables (just remember to write/copy them to a permanent library, not WORK).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you need to anonymize the text variable NAME as well, you could just insert something like&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;name=cats('Dummy',anon_id);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;before the second output statement, the patients' name will then be anonymized as well.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By using the same anonymization table for all the tables containing Patients' IDs, you will get the same translation from real to anonymized ID in all the tables.&lt;/P&gt;</description>
      <pubDate>Sat, 06 Jan 2024 12:12:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910731#M359135</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2024-01-06T12:12:18Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910735#M359136</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will suggest the following. It works after the same principles as other suggestions with a hash value.&amp;nbsp;The idea is to create pseudonym tables from "real" input tables WITHOUT keeping a table with ID's and pseudonyms.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The two set of values values don't exist together outside the running program, so there are no translation tables to maintain and protect from unautohorized access.&amp;nbsp;At the same time, real identities can easily be restored&amp;nbsp;from the original input tables by joining using the same ID pseudonymization in the join, but&amp;nbsp;it is impossible without access to these&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;* Test data;
data maintable;
  ID = 12345; Name = 'Billy Nash'; Desease = 11; output;
  ID = 12345; Name = 'Billy Nash'; Desease = 17; output;
  ID = 23456; Name = 'jeff Smith'; Desease = 22; output;
  ID = 34567; Name = 'John Doe'; Desease = 33; output;
run;

* Create Pseudonym table;
* ID is converted to MD5 hash value,
* Name is converted to a defaultname + patient counter, this gives some extra coding 
    thanks to several records prt ID;
proc sort data=maintable;
  by ID Desease;
run;

data anontable (drop=ID Name pnr);
  length PseudoID $36 PseudoName $40;
  set maintable;
  by ID;
  retain pnr 0;
  if first.ID then pnr = pnr + 1;
  PseudoID = put(md5(put(ID,12.)),$hex32.);
  PseudoName = catx(' ', 'Patient', put(pnr,8.), 'Pseudoname');
run;

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="anon1.gif" style="width: 521px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92342i48C54A93607B7BDE/image-size/large?v=v2&amp;amp;px=999" role="button" title="anon1.gif" alt="anon1.gif" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
* Restore identities;
proc sql;
  create table restore as
    select distinct
      maintable.ID,
      maintable.Name,
      anontable.Desease
    from anontable
    left join maintable
    on put(md5(put(maintable.ID,12.)),$hex32.) = anontable.PseudoID
  ;
quit;

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-left" image-alt="anon2.gif" style="width: 351px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92343iF59343D9F80CF068/image-size/large?v=v2&amp;amp;px=999" role="button" title="anon2.gif" alt="anon2.gif" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 06 Jan 2024 14:37:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910735#M359136</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2024-01-06T14:37:02Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910746#M359137</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12887"&gt;@ErikLund_Jensen&lt;/a&gt; - This is a very cool approach! Just need to ensure the hashing technique remains the same to retrieve the original values. Also the anonymised names make it clear what has been done. &lt;/P&gt;</description>
      <pubDate>Sat, 06 Jan 2024 20:39:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910746#M359137</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-01-06T20:39:46Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910760#M359138</link>
      <description>&lt;P&gt;Home &amp;gt; SAS Community Nordic &amp;gt; SAS Nordic Users Group &amp;gt; &lt;BR /&gt;Juletip #1 - Even Santa hides his secrets – How to mask your data in SAS Studio?&lt;BR /&gt;Posted 12-01-2023 02:07 AM&lt;BR /&gt;&lt;A href="https://communities.sas.com/t5/SAS-Community-Nordic/Juletip-1-Even-Santa-hides-his-secrets-How-to-mask-your-data-in/td-p/905545" target="_blank"&gt;https://communities.sas.com/t5/SAS-Community-Nordic/Juletip-1-Even-Santa-hides-his-secrets-How-to-mask-your-data-in/td-p/905545&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;BR, Koen&lt;/P&gt;</description>
      <pubDate>Sun, 07 Jan 2024 12:54:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910760#M359138</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2024-01-07T12:54:38Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910999#M359237</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/60547"&gt;@sbxkoenk&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12887"&gt;@ErikLund_Jensen&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/76464"&gt;@s_lassen&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;I have tried all the samples you posted they are all very good ways to solved this problem depending on ones needs. Thankyou very much.&lt;/P&gt;
&lt;P&gt;All the same I have a question, assuming I have an ID=12345 in the&amp;nbsp; main table and the ID in the sub table is assigned 12345_c because the patient had a chemotherapy and this id belongs to the chemotherapy table.&amp;nbsp; The other ID is 12345_r because of the radiotherapy treatment (This id's a have this extensions because the data comes from different centers).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My problem is if I use the hash function sha256 or md5 and use the format $hex64. I get different pseudonyms as the main ID's. The idea behinde all this is to assign both 12345_c and 12345_r to the main ID 12345&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is there any solution to this using the hash function or do I will need to define my own&amp;nbsp; values for the ID's&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2024 14:14:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/910999#M359237</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-01-09T14:14:26Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911039#M359249</link>
      <description>&lt;P&gt;How about just hashing the main ID then adding the _c or _r unhashed to get the secondary IDs?&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2024 19:04:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911039#M359249</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2024-01-09T19:04:14Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911044#M359252</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This is easily solved. If the ID's are like your example with an underscore as delimiter and a suffix after that, the following code will work. The ID' variable in the subtable must be character to contain underscore_letter, and my axample is made on the assumption that ID's in the main table are character as well. The Idea is to make hashing in the subtables on the part before the underscore.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note the trimming (with cats) of the argument to in all MD5 function calls, this is to avoid including an unknown number of blanks in generation of the hash value. Note also that the generation of a pseudonym name is omitted in the sub table. This is because the&amp;nbsp;pseudonym name in the main table contains a counter with one added for each new ID, and the same count cannot be generated in the sub tables, if not all ID's are present.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;* Test data;
data maintable;
  ID = '12345'; Name = 'Billy Nash'; Desease = 11; output;
  ID = '12345'; Name = 'Billy Nash'; Desease = 17; output;
  ID = '23456'; Name = 'jeff Smith'; Desease = 22; output;
  ID = '34567'; Name = 'John Doe'; Desease = 33; output;
run;

data subtable;
  ID = '12345_r'; Subtable_var = 11; output;
  ID = '12345_r'; Subtable_var = 17; output;
  ID = '23456_r'; Subtable_var = 22; output;
run;

* Create Pseudonym Maintable - Change ID to PseudoID;
* ID is converted to MD5 hash value,
* Name is converted to a defaultname + patient counter, this gives some extra coding 
    thanks to several records prt ID;
proc sort data=maintable;
  by ID Desease;
run;

data anontable (drop=ID Name pnr);
  length PseudoID $36 PseudoName $40;
  set maintable;
  by ID;
  retain pnr 0;
  if first.ID then pnr = pnr + 1;
  PseudoID = put(md5(cats(ID)),$hex32.);
  PseudoName = catx(' ', 'Patient', put(pnr,8.), 'Pseudoname');
run;


* Create Pseudonym Subtable - Change ID to PseudoID;
data anonsubtable (drop=ID MainID);
  length MainID $20 PseudoID $36;
  set subtable;
  by ID;
  MainID = scan(ID,1,'_');
  PseudoID = put(md5(cats(MainID)),$hex32.);
run;

* Restore identities on anonymized maintable;
proc sql;
  create table restore_main as
    select distinct
      maintable.ID,
      maintable.Name,
      anontable.Desease
    from anontable
    left join maintable
    on put(md5(cats(maintable.ID)),$hex32.) = anontable.PseudoID
  ;
quit;

* Restore identities on anonymized subtable;
proc sql;
  create table restore_sub as
    select distinct
      maintable.ID,
      maintable.Name,
      anonsubtable.Subtable_var
    from anonsubtable
    left join maintable
    on put(md5(cats(maintable.ID)),$hex32.) = anonsubtable.PseudoID
  ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2024 20:19:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911044#M359252</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2024-01-09T20:19:06Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911058#M359257</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;You could just remove the extraneous information prior to creating the digest value as per below sample code.&lt;/P&gt;
&lt;P&gt;md5 creates a 128 bit (16 byte) digest value that you can express as a 32 character hex string, sha256 creates a 256 bit digest value that then needs a 64 character hex string.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data demo;
  input id $20.;

  length id_md5_hex $32. id_sha256_hex $64.;
  id_md5_hex   =hashing('md5',strip(scan(ID,1,'_')));
  id_sha256_hex=hashing('sha256',strip(scan(ID,1,'_')));

  length id_md5_hex_2 $34. id_sha256_hex_2 $68.;
  id_md5_hex_2   =catx('_',hashing('md5',strip(scan(ID,1,'_'))),scan(ID,2,'_'));
  id_sha256_hex_2=catx('_',hashing('sha256',strip(scan(ID,1,'_'))),scan(ID,2,'_'));

  datalines;
12345
12345_c
;

proc print data=demo;
  var id_md5_hex: id_sha256_hex:;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1704851925326.png" style="width: 1404px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92414i95322EE124C68BC0/image-dimensions/1404x75?v=v2" width="1404" height="75" role="button" title="Patrick_0-1704851925326.png" alt="Patrick_0-1704851925326.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please note that if someone figures out how you created the digest values then it's with today's compute power no more that hard to reverse the process. Below sample code to illustrate this.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data digest_lookup(compress=yes);
  length id_md5_hex $32.; 
  do i=1 to 1000000;
    id=put(i,20. -l);
    id_md5_hex =hashing('md5',strip(scan(ID,1,'_')));
    output;
  end;
  drop i;
run;

data unmasked;
  set digest_lookup;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'demo');
      h1.defineKey('id_md5_hex');
      h1.defineData('id');
      h1.defineDone();
    end;
  if h1.find()=0 then output;
run;
proc print data=unmasked;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_1-1704851733817.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92413i5CF213DD7303E7DB/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Patrick_1-1704851733817.png" alt="Patrick_1-1704851733817.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Because of the above I would at least also "scramble" the hex string. Some simple algorithm as below would already make it much much harder to just guess what you've done and use a simple generated lookup table.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc format;
  invalue hex2dec
    '0'=0
    '1'=1
    '2'=2
    '3'=3
    '4'=4
    '5'=5
    '6'=6
    '7'=7
    '8'=8
    '9'=9
    'A'=10
    'B'=11
    'C'=12
    'D'=13
    'E'=14
    'F'=15
    ;
data demo;
  input id $20.;

  length id_md5_hex id_md5_hex_scrambled $32.;
  id_md5_hex   =hashing('md5',strip(scan(ID,1,'_')));

  l=input(substr(id_md5_hex,12,1),hex2dec.)+6;
  id_md5_hex_scrambled=cats(substr(id_md5_hex,l+1),substr(id_md5_hex,1,l));
  drop l;
  datalines;
12345
12345_c
;

proc print data=demo;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1704853882857.png" style="width: 712px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/92415i37E7256DF6F9FFE5/image-dimensions/712x95?v=v2" width="712" height="95" role="button" title="Patrick_0-1704853882857.png" alt="Patrick_0-1704853882857.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jan 2024 02:38:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911058#M359257</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2024-01-10T02:38:25Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911088#M359267</link>
      <description>&lt;P&gt;Assuming that the ID variable in the main table is numeric, and the ones in the subsidiary tables are character, you could try something like this for the subsidiary tables:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data subsidiary1_anon(drop=id local_id) anonymization_id;
  set subsidiary1(rename=(id=local_id));
  id=input(scan(local_id,1,'_'),best32.);
  id_type=scan(local_id,2,'_'); /* I suppose we want to keep this */
  modify anonymization_id key=id/unique nobs=n_id;
  if _iorc_ then do; /* assuming that the ID was not found */
    n_id+1;
    anon_id=n_id;
    output  anonymization_id;
    _error_=0;
    end;
  output patient1_anon;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 10 Jan 2024 09:52:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911088#M359267</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2024-01-10T09:52:53Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911103#M359275</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/76464"&gt;@s_lassen&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12447"&gt;@Patrick&lt;/a&gt;&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12887"&gt;@ErikLund_Jensen&lt;/a&gt;&amp;nbsp; Thanks for those examples. Combining your ideas, I think am almost getting what I want&lt;/P&gt;</description>
      <pubDate>Wed, 10 Jan 2024 11:53:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911103#M359275</guid>
      <dc:creator>Anita_n</dc:creator>
      <dc:date>2024-01-10T11:53:32Z</dc:date>
    </item>
    <item>
      <title>Re: Pseudonymization of sensitive data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911243#M359328</link>
      <description>&lt;P&gt;I guess problem of &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/168930"&gt;@Anita_n&lt;/a&gt;&amp;nbsp;is solved by now , but here's an interesting blog on the topic that was published yesterday :&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Using the BXOR Function to Encode and Decode Text&lt;/STRONG&gt; &lt;BR /&gt;By Ron Cody on SAS Users January 10, 2024&lt;BR /&gt;&lt;A href="https://blogs.sas.com/content/sgf/2024/01/10/using-the-bxor-function-to-encode-and-decode-text/" target="_blank"&gt;https://blogs.sas.com/content/sgf/2024/01/10/using-the-bxor-function-to-encode-and-decode-text/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Koen&lt;/P&gt;</description>
      <pubDate>Thu, 11 Jan 2024 09:35:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Pseudonymization-of-sensitive-data/m-p/911243#M359328</guid>
      <dc:creator>sbxkoenk</dc:creator>
      <dc:date>2024-01-11T09:35:32Z</dc:date>
    </item>
  </channel>
</rss>

