I would like to create a variable from first name, last name, and date of birth that looks like this:
First letter of first name + 3rd letter of first name + first letter of last name + 3rd letter of last name + DOB (mmddyy) + indicator of race (1-4) + Letter U
For example, for a person with this data:
First_name Last_name DOB Race
Sarah Smith 09/01/1999 2
I want to create a variable that looks like this:
SRSI0901992U
Any help would be greatly appreciated. Thank you in advance.
I would not recommend using data like that as an ID. Data errors and other problems (not to mention possible duplicates) will render the ID useless. It is technically possible:
data want; set have; length id $100; id=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),put(dob,ddmmyy8.),put(race,1.),"U")); run;
But I really wouldn't recommend it.
I would not recommend using data like that as an ID. Data errors and other problems (not to mention possible duplicates) will render the ID useless. It is technically possible:
data want; set have; length id $100; id=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),put(dob,ddmmyy8.),put(race,1.),"U")); run;
But I really wouldn't recommend it.
I agree that this identifier is sub-optimal, but unfortunately I need to do a datamatch with a database that only has this identifier available.
One quick follow-up question: How do I remove the slashes from the identifier?
I am currently using this code:
data match.mydatabase3;
set match.mydatabase3;
length URN $100;
URN=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),substr(last_name,1,1),substr(last_name,3,1),put(dob,MMDDYY8.),put(Gender_forTLCmatch,1.),"U"));
run;
Which produces this: SRSI09/01/992U
Instead of this (which is what I want): SRSI0901992U
Thanks very much.
Just add a compress:
URN=compress(upcase(cats(substr(first_name,1,1),substr(first_name,3,1),substr(last_name,1,1),substr(last_name,3,1),put(dob,MMDDYY8.),put(Gender_forTLCmatch,1.),"U")),"/");
You didn't say what your real objective is. Maybe this format is prescribed by someone else, and that's just the format you need.
[Edit: now I see you're stuck with the format you asked for...but I'll leave this response for others who have a similar situation.]
BUT, if the real goal is to anonymize these fields such that the personal info is obscured, yet the original value can be checked/verified against this encoded value, then you probably want to use a one-way hash.
SAS supports an industry-standard function (SHA256) that encrypts the value you supply. Assuming you have the original fields stored off somewhere else in a more secure table, you can always re-connect that personal data with the encrypted data because the calculated hash will be the same. But a person with just access to the hash value will not be able to determine the original value.
Something like:
hash = sha256(cats(first_name,last_name,put(dob,ddmmyy8.)));
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.