BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Patrick
Opal | Level 21

@Ronein Just encoding via $hex64 doesn't cut it. Anyone who knows how you encoded the values can easily decode them. 

 

But reading your original requirements again: So if "the other company" adds data (columns) to your table then they also need to know the customer. How else would they be capable to add the data to the right customer. If this is true then it's likely more about secure transmission of data and not about masking PII columns. Is that correct?

Ronein
Meteorite | Level 14
The real story is that by law it is forbidden for me to keep real customer id so i need to keep other fake value and my target is to be able to come back to real customer id ( from fake customer id)
Tom
Super User Tom
Super User

Note if the law is you cannot transmit those IDs then sending with such a simple encoding will probably not protect you from prosecution under the law.

Patrick
Opal | Level 21

I've developed some time ago a masking framework for masking PII columns in one environment and then transfer the masked tables to another environment (from PROD to DEV). 

Important in this scenario was that all columns in the masked tables still had the same attributes (like data type and length). For this reason any approach using MD5, SHA etc. was not suitable. 

 

In my case just using a sequence number for the masked data was no issue. And it's much easier to manage than using a random number because the moment you're generating the number randomly you need to also build a check that you haven't generated the same number already for another cleartext value. ...and because of the birthday paradigm the probability to do so is higher than one would intuitively assume.

 

If just using a sequence number the only thing you need to pay some attention to is to not use the algorithm on data that's sorted by the cleartext variable you want to mask (like a customer number that's often also "sequential").

 

To implement something fully dynamically that's also works for character variables with short length and alphanumerical strings takes a bit more but for your use case things can be kept rather simple. 

 

It was for my use case important that the same cleartext value always resulted in the same masked values even when running the masking algorithm for multiple tables and tables from multiple months and at different times. 

The only way this can work is to store and maintain a permanent masking lookup table with key-value pairs of cleartext and masked values.

 

Here what could work for you and meets all the above formulated requirements.

libname perm "%sysfunc(pathname(work))";

/* sample data */
data work.have;
  do cust_id=4,1,99999,10,1,5,4;
    obs_num+1;
    output;
  end;
run;

/* create permanent masking lookup table if not exists */
/* - key-value pairs of cleartext-masked values        */
%if %sysfunc(exist(perm.cust_mask)) ne 1 %then
  %do;
    data perm.cust_masklookup(index=;
      stop;
      set work.have(keep=cust_id);
      cust_id_masked=cust_id;
    run;
  %end;

/* create table with masked data */
data work.masked;
  if _n_=1 then 
    do;
      if 0 then set perm.cust_masklookup(rename=(cust_id_masked=__cust_id_masked));
      /* permanent masking lookup table with data masked in previous runs */
      dcl hash __h1(dataset:'perm.cust_masklookup(rename=(cust_id_masked=__cust_id_masked))');
      __h1.defineKey('cust_id');
      __h1.defineData('__cust_id_masked');
      __h1.defineDone();
      /* mask lookup table to collect with masked values created in this run */
      dcl hash __h1_new(dataset:'perm.cust_masklookup(obs=0 rename=(cust_id_masked=__cust_id_masked))');
      __h1_new.defineKey('cust_id');
      __h1_new.defineData('cust_id','__cust_id_masked');
      __h1_new.defineDone();

    end;
  call missing(of _all_);

  set work.have end=last;

  if __h1.find() ne 0 and __h1_new.find() ne 0 then
    do;
      __cust_id_masked=sum(__h1.num_items,__h1_new.num_items,76589);
      __rc=__h1_new.add(key:cust_id, data:cust_id, data:__cust_id_masked);
    end;
  cust_id=__cust_id_masked;

  /* write newly created key-value pairs of cleartext/masked values to work table */
  if last then __h1_new.output(dataset:'work.cust_masklookup_new(rename=(__cust_id_masked=cust_id_masked))');

  drop __:;

run;

/* append newly created masked values to permanent lookup table */
proc append base=perm.cust_masklookup data=work.cust_masklookup_new;
run;quit;

/* unmask values */
data work.un_masked;
  if _n_=1 then
    do;
      if 0 then set perm.cust_masklookup(rename=(cust_id_masked=__cust_id_masked));
      dcl hash h1(dataset:'perm.cust_masklookup(rename=(cust_id_masked=__cust_id_masked))');
      h1.defineKey('__cust_id_masked');
      h1.defineData('cust_id');
      h1.defineDone();
    end;
  call missing(of _all_);
  set work.masked;
  if h1.find(key:cust_id) ne 0 then 
    do;
      put 'Clear text key not found in masking lookup table';
      put cust_id=;
      abort;
    end;
  drop __:;
run;

/* print the various tables */
title 'PERM: cust_masklookup';
proc print data=perm.cust_masklookup_new;
run;
title 'have';
proc print data=work.have;
run;
title 'masked';
proc print data=work.masked;
run;
title 'un_masked';
proc print data=work.un_masked;
run;
title;

Please spend some time to understand above code before you start to ask questions about it.

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Ronein 

 

The SAS encryption algoritms  will always generate the SAME KEY for a given value. This can be used in the following way:

1. Generate data to the external part with an encrypted key only.

2. Join your original and the returned data on the encrypted key, which you generate once more from the original ID in the join condition.

This way, you don't have and don't need a data set with both the original key and the encrypted key. I think that should satisfy all requirements.

* What you have;
data have;
  ID = 123456;
  Name = 'Ronein';
run;

* You make encrypted data to external part;
data xport; set have;
  drop ID;
  ID_encrypt = put(md5(cats(put(ID,12.))),$hex32.);
run;

* External part returns data with added information;
data return;
  set xport;
  Occupation = 'SAS Programmer';
run;

* You merge added infornmation onto orig data;
proc sql;
  create table want as
    select
      a.ID,
      a.Name,
      b.Occupation
    from have as a full outer join return as b
    on put(md5(cats(put(a.ID,12.))),$hex32.) = b.ID_encrypt;
quit;

 

 

 

Patrick
Opal | Level 21

@ErikLund_Jensen wrote:

Hi @Ronein 

 

The SAS encryption algoritms  will always generate the SAME KEY for a given value. This can be used in the following way:

1. Generate data to the external part with an encrypted key only.

2. Join your original and the returned data on the encrypted key, which you generate once more from the original ID in the join condition.

This way, you don't have and don't need a data set with both the original key and the encrypted key. I think that should satisfy all requirements.

 


MD5 and SHA is ENCODING and not ENCRYPTION 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Patrick 

You are right, of course, and I bow in shame for being so imprecise in my wording.

But it doesn't make any difference here. 

 

The external partner receives a  non-reversible masked ID only,

there is no table containing both the original ID and the masked ID, 

but he returned data can nevertheless be joined with the original data using the masked ID as key, because it can be recreated in the join-condition.

 

yabwon
Onyx | Level 15

[EDIT:] (after sending this I saw that @Tom already wrote that "no encrypting" answer)

 

The question was already answered but the following also satisfies your 1,2, and 3 requirements:

Data have;
input cust_ID someValue $;
cards;
33948398 A
54897733 B
98376333 C
54897733 D
;
run;

proc sort data=have(keep=cust_ID) out=lookup nodupkey;
  by cust_ID;
run;

data lookup;
  set lookup;
  masked_cust_ID + 1;
run;

proc print data=lookup;
run;

Since it is you who provides input data you have full control on the "masking" and no encryption function is needed.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 1434 views
  • 11 likes
  • 8 in conversation