BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ammarhm
Lapis Lazuli | Level 10

HI everyone, 

I have a general and a SAS specific question. 

I am using SAS to create and maintain a large database that different sites will contribute information to.

Crucial in this process that only de-identified information on patients will be stored in the database, and de-identification of data happens at the site before sending the data to the registry. 

There are two problems here, one easy, one much harder:

1. The easy problem: is there a built in procedure or a macro that anyone knows of, that can take a number of identifiers (patient's name, date of birth, Medicare ID, etc) and then generates a unique statistical ID that can be used in the database?

2. The difficult problem: The database should be able to identify repeat admissions/ data from the same patient at different sites. So the (de-identified) ID of a patient should be shared between the sites. However, as the process of de-identification should ideally happen at the site level before the data leaves to the registry, this poses a problem of how to be able to assign the same ID to the patient across multiple sites without the site sharing the identifiers between them. Has anyone dealt with a similar problem and how have you approached it?

Thanks heaps. 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

You could use MD5 or eventually even better SHA to create a digest value. You will need to ensure that your data providers all use the same encoding for their source data.

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lefunctionsref/n05ptq6zr5amxkn18mjkyvbkjjo... 

You could have your data providers run a demo program like below and then check if they all send you back the same sha256 values.

data demo;
  set sashelp.class;
  sha256 = hashing('sha256',catx('|',name,age));
run;
proc print data=demo(drop=name age);
run;

 

Your data providers would then send you data like:

Patrick_0-1646648663616.png

 

Now for the decoding

On your end if you get the same digest value multiple times you know that you've got duplicates.

What your data providers could do is maintain a table with the digest values and the plain text values. You then just send the digest value to them and they can look-up the plain text values (the composite key) to identify the record.

...or they can also simply run their process again and select the records that get the digest value you provide.

data demo;
  set sashelp.class;
  sha256 = hashing('sha256',catx('|',name,age));
  if sha256='4D9D84608FD17ED1B9E4F5AF1777D5471A0290B2F94060D4AC917C4FDC6FF45E';
run;
proc print data=demo;
run;

Patrick_1-1646648788431.png

 

Which SHA to choose - like SHA1, 256, 512 etc. - will depend on your storage, performance and security requirements.

View solution in original post

2 REPLIES 2
Patrick
Opal | Level 21

You could use MD5 or eventually even better SHA to create a digest value. You will need to ensure that your data providers all use the same encoding for their source data.

https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.2/lefunctionsref/n05ptq6zr5amxkn18mjkyvbkjjo... 

You could have your data providers run a demo program like below and then check if they all send you back the same sha256 values.

data demo;
  set sashelp.class;
  sha256 = hashing('sha256',catx('|',name,age));
run;
proc print data=demo(drop=name age);
run;

 

Your data providers would then send you data like:

Patrick_0-1646648663616.png

 

Now for the decoding

On your end if you get the same digest value multiple times you know that you've got duplicates.

What your data providers could do is maintain a table with the digest values and the plain text values. You then just send the digest value to them and they can look-up the plain text values (the composite key) to identify the record.

...or they can also simply run their process again and select the records that get the digest value you provide.

data demo;
  set sashelp.class;
  sha256 = hashing('sha256',catx('|',name,age));
  if sha256='4D9D84608FD17ED1B9E4F5AF1777D5471A0290B2F94060D4AC917C4FDC6FF45E';
run;
proc print data=demo;
run;

Patrick_1-1646648788431.png

 

Which SHA to choose - like SHA1, 256, 512 etc. - will depend on your storage, performance and security requirements.

ballardw
Super User

Considering the number of times that I find in not terribly large data sets supposedly identical people with different birth dates from a single medical provider site, I pray for your success.

 

By not very large I mean fewer than 1,000 records in a year. I have seen the same "person" with two birth dates for services received on the same date at the same location and the only difference being where a physical test specimen was taken from the patient. 

If you have other identfiers as you say, you might consider reducing the items used for encryption / decryption involved.

 

Don't even get me started on names. 😢

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 524 views
  • 0 likes
  • 3 in conversation