<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Dealing with (de-identified/ re-identifiable) data from multiple sites in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Dealing-with-de-identified-re-identifiable-data-from-multiple/m-p/800595#M314989</link>
    <description>&lt;P&gt;HI everyone,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a general and a SAS specific question.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am using SAS to create and maintain a large database that different sites will contribute information to.&lt;/P&gt;
&lt;P&gt;Crucial in this process that only de-identified information on patients will be stored in the database, and de-identification of data happens at the site before sending the data to the registry.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There are two problems here, one easy, one much harder:&lt;/P&gt;
&lt;P&gt;1. The easy problem: is there a built in procedure or a macro that anyone knows of, that can take a number of identifiers (patient's name, date of birth, Medicare ID, etc) and then generates a unique statistical ID that can be used in the database?&lt;/P&gt;
&lt;P&gt;2. The difficult problem: The database should be able to identify repeat admissions/ data from the same patient at different sites. So the (de-identified) ID of a patient should be shared between the sites. However, as the process of de-identification should ideally happen at the site level before the data leaves to the registry, this poses a problem of how to be able to assign the same ID to the patient across multiple sites without the site sharing the identifiers between them. Has anyone dealt with a similar problem and how have you approached it?&lt;/P&gt;
&lt;P&gt;Thanks heaps.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 07 Mar 2022 09:58:13 GMT</pubDate>
    <dc:creator>ammarhm</dc:creator>
    <dc:date>2022-03-07T09:58:13Z</dc:date>
    <item>
      <title>Dealing with (de-identified/ re-identifiable) data from multiple sites</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dealing-with-de-identified-re-identifiable-data-from-multiple/m-p/800595#M314989</link>
      <description>&lt;P&gt;HI everyone,&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I have a general and a SAS specific question.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am using SAS to create and maintain a large database that different sites will contribute information to.&lt;/P&gt;
&lt;P&gt;Crucial in this process that only de-identified information on patients will be stored in the database, and de-identification of data happens at the site before sending the data to the registry.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There are two problems here, one easy, one much harder:&lt;/P&gt;
&lt;P&gt;1. The easy problem: is there a built in procedure or a macro that anyone knows of, that can take a number of identifiers (patient's name, date of birth, Medicare ID, etc) and then generates a unique statistical ID that can be used in the database?&lt;/P&gt;
&lt;P&gt;2. The difficult problem: The database should be able to identify repeat admissions/ data from the same patient at different sites. So the (de-identified) ID of a patient should be shared between the sites. However, as the process of de-identification should ideally happen at the site level before the data leaves to the registry, this poses a problem of how to be able to assign the same ID to the patient across multiple sites without the site sharing the identifiers between them. Has anyone dealt with a similar problem and how have you approached it?&lt;/P&gt;
&lt;P&gt;Thanks heaps.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 09:58:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dealing-with-de-identified-re-identifiable-data-from-multiple/m-p/800595#M314989</guid>
      <dc:creator>ammarhm</dc:creator>
      <dc:date>2022-03-07T09:58:13Z</dc:date>
    </item>
    <item>
      <title>Re: Dealing with (de-identified/ re-identifiable) data from multiple sites</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dealing-with-de-identified-re-identifiable-data-from-multiple/m-p/800601#M314993</link>
      <description>&lt;P&gt;You could use MD5 or eventually even better SHA to create a digest value. You will need to ensure that your data providers all use the same encoding for their source data.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lefunctionsref/n05ptq6zr5amxkn18mjkyvbkjjos.htm" target="_blank" rel="noopener"&gt;https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lefunctionsref/n05ptq6zr5amxkn18mjkyvbkjjos.htm&lt;/A&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could have your data providers run a demo program like below and then check if they all send you back the same sha256 values.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data demo;
  set sashelp.class;
  sha256 = hashing('sha256',catx('|',name,age));
run;
proc print data=demo(drop=name age);
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Your data providers would then send you data like:&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_0-1646648663616.png" style="width: 491px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/69208iF9EF13EDFD2876B7/image-dimensions/491x86?v=v2" width="491" height="86" role="button" title="Patrick_0-1646648663616.png" alt="Patrick_0-1646648663616.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now for the decoding&lt;/P&gt;
&lt;P&gt;On your end if you get the same digest value multiple times you know that you've got duplicates.&lt;/P&gt;
&lt;P&gt;What your data providers could do is maintain a table with the digest values and the plain text values. You then just send the digest value to them and they can look-up the plain text values (the composite key) to identify the record.&lt;/P&gt;
&lt;P&gt;...or they can also simply run their process again and select the records that get the digest value you provide.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data demo;
  set sashelp.class;
  sha256 = hashing('sha256',catx('|',name,age));
  if sha256='4D9D84608FD17ED1B9E4F5AF1777D5471A0290B2F94060D4AC917C4FDC6FF45E';
run;
proc print data=demo;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Patrick_1-1646648788431.png" style="width: 738px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/69209i37B1910E663F247B/image-dimensions/738x59?v=v2" width="738" height="59" role="button" title="Patrick_1-1646648788431.png" alt="Patrick_1-1646648788431.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Which SHA to choose - like SHA1, 256, 512 etc. - will depend on your storage, performance and security requirements.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 22:44:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dealing-with-de-identified-re-identifiable-data-from-multiple/m-p/800601#M314993</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2022-03-07T22:44:05Z</dc:date>
    </item>
    <item>
      <title>Re: Dealing with (de-identified/ re-identifiable) data from multiple sites</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Dealing-with-de-identified-re-identifiable-data-from-multiple/m-p/800733#M315050</link>
      <description>&lt;P&gt;Considering the number of times that I find in not terribly large data sets supposedly identical people with different birth dates from a single medical provider site, I pray for your success.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;By not very large I mean fewer than 1,000 records in a year. I have seen the same "person" with two birth dates for services received on the same date at the same location and the only difference being where a physical test specimen was taken from the patient.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you have other identfiers as you say, you might consider reducing the items used for encryption / decryption involved.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Don't even get me started on names. &lt;span class="lia-unicode-emoji" title=":crying_face:"&gt;😢&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 07 Mar 2022 21:17:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Dealing-with-de-identified-re-identifiable-data-from-multiple/m-p/800733#M315050</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2022-03-07T21:17:58Z</dc:date>
    </item>
  </channel>
</rss>

