BookmarkSubscribeRSS Feed
Smitha9
Fluorite | Level 6

Hi,

I have a dataset

ID                           Date

253645              01/23/2004

234654              02/05/2001

243657              03/06/1999

243657              05/06/2003

326789              09/03/2009

983211              04/03/2007

983211              08/10/2002

 

I want to deidentify the ID and the Date and saved it. I also want to run the code  to cancel the deidentified variables to original ID and Date  if needed. 

 

could you let me know if this is done is SAS?

 

thank you in advance.

 

2 REPLIES 2
Kurt_Bremser
Super User

Use hash objects:

/* the dataset you have */
data have;
input ID :$6. Date :mmddyy10.;
format date yymmdd10.;
datalines;
253645              01/23/2004
234654              02/05/2001
243657              03/06/1999
243657              05/06/2003
326789              09/03/2009
983211              04/03/2007
983211              08/10/2002
;

/* initialize a dataset which keeps the anonymized IDs */
data anonymize;
length id id_anon $6;
stop;
run;

/* anonymize, and keep a record */
data want;
set have end=done;
if _n_ = 1
then do;
  length id_anon $6;
  declare hash a (dataset:"anonymize");
  a.definekey("id");
  a.definedata("id","id_anon");
  a.definedone();
  declare hash b (dataset:"anonymize");
  b.definekey("id_anon");
  b.definedone();
end;
if a.find() = 0
then id = id_anon;
else do;
  id_anon = put(rand("integer",100000,999999),6.);
  do while (b.check() = 0);
    id_anon = put(rand("integer",100000,999999),6.);
  end;
  rc = a.add();
  rc = b.add();
  id = id_anon;
end;
if done then rc = a.output(dataset:"anonymize");
drop id_anon rc;
run;

/* repeat with expanded data */
data have_new;
input ID :$6. Date :mmddyy10.;
format date yymmdd10.;
datalines;
253645              01/23/2004
234654              02/05/2001
243657              03/06/1999
243657              05/06/2003
326789              09/03/2009
983211              04/03/2007
983211              08/10/2002
123456              07/27/2022
;

data want_new;
set have_new end=done;
if _n_ = 1
then do;
  length id_anon $6;
  declare hash a (dataset:"anonymize");
  a.definekey("id");
  a.definedata("id","id_anon");
  a.definedone();
  declare hash b (dataset:"anonymize");
  b.definekey("id_anon");
  b.definedone();
end;
if a.find() = 0
then id = id_anon;
else do;
  id_anon = put(rand("integer",100000,999999),6.);
  do while (b.check() = 0);
    id_anon = put(rand("integer",100000,999999),6.);
  end;
  rc = a.add();
  rc = b.add();
  id = id_anon;
end;
if done then rc = a.output(dataset:"anonymize");
drop id_anon rc;
run;

Look at the resulting datasets.

From there it should be easy to expand the logic for date, depending on if id and date form a pair or if id and date need to be anonymized separately.

LinusH
Tourmaline | Level 20

A slightly easier (from programming perspective) is the use of hash functions, like MD5.

The result is not revertable, so you need to store the original value with the "deidentified" value in a table for later lookup.

Not sure about date though, usually they carry business meaning. Do you want to create another valid date?

Data never sleeps

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 325 views
  • 0 likes
  • 3 in conversation