DATA Step, Macro, Functions and more

How to create an identifier using first name, last name, and date of birth

Accepted Solution Solved
Reply
Contributor
Posts: 51
Accepted Solution

How to create an identifier using first name, last name, and date of birth

I would like to create a variable from first name, last name, and date of birth that looks like this:

 

First letter of first name + 3rd letter of first name + first letter of last name + 3rd letter of last name + DOB (mmddyy) + indicator of race (1-4) + Letter U

 

For example, for a person with this data:

First_name    Last_name   DOB                 Race

Sarah            Smith            09/01/1999      2

 

I want to create a variable that looks like this:

SRSI0901992U

 

Any help would be greatly appreciated. Thank you in advance.


Accepted Solutions
Solution
‎09-14-2017 11:23 AM
Super User
Super User
Posts: 9,427

Re: How to create an identifier using first name, last name, and date of birth

I would not recommend using data like that as an ID.  Data errors and other problems (not to mention possible duplicates) will render the ID useless.  It is technically possible:

data want;
  set have;
  length id $100;
  id=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),put(dob,ddmmyy8.),put(race,1.),"U"));
run;

But I really wouldn't recommend it.

View solution in original post


All Replies
Solution
‎09-14-2017 11:23 AM
Super User
Super User
Posts: 9,427

Re: How to create an identifier using first name, last name, and date of birth

I would not recommend using data like that as an ID.  Data errors and other problems (not to mention possible duplicates) will render the ID useless.  It is technically possible:

data want;
  set have;
  length id $100;
  id=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),put(dob,ddmmyy8.),put(race,1.),"U"));
run;

But I really wouldn't recommend it.

Contributor
Posts: 51

Re: How to create an identifier using first name, last name, and date of birth

I agree that this identifier is sub-optimal, but unfortunately I need to do a datamatch with a database that only has this identifier available.

 

One quick follow-up question: How do I remove the slashes from the identifier? 

 

I am currently using this code:

 

data match.mydatabase3;

  set match.mydatabase3;

  length URN $100;

 URN=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),substr(last_name,1,1),substr(last_name,3,1),put(dob,MMDDYY8.),put(Gender_forTLCmatch,1.),"U"));

run;

 

Which produces this:  SRSI09/01/992U

 

Instead of this (which is what I want):  SRSI0901992U

 

Thanks very much.

Super User
Super User
Posts: 9,427

Re: How to create an identifier using first name, last name, and date of birth

Just add a compress:

 URN=compress(upcase(cats(substr(first_name,1,1),substr(first_name,3,1),substr(last_name,1,1),substr(last_name,3,1),put(dob,MMDDYY8.),put(Gender_forTLCmatch,1.),"U")),"/");
Community Manager
Posts: 3,380

Re: How to create an identifier using first name, last name, and date of birth

[ Edited ]

You didn't say what your real objective is.  Maybe this format is prescribed by someone else, and that's just the format you need. 

[Edit: now I see you're stuck with the format you asked for...but I'll leave this response for others who have a similar situation.]

 

BUT, if the real goal is to anonymize these fields such that the personal info is obscured, yet the original value can be checked/verified against this encoded value, then you probably want to use a one-way hash.

 

SAS supports an industry-standard function (SHA256) that encrypts the value you supply.  Assuming you have the original fields stored off somewhere else in a more secure table, you can always re-connect that personal data with the encrypted data because the calculated hash will be the same.  But a person with just access to the hash value will not be able to determine the original value.

 

Something like:

  

hash = sha256(cats(first_name,last_name,put(dob,ddmmyy8.)));

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 108 views
  • 0 likes
  • 3 in conversation