I would like to create a variable from first name, last name, and date of birth that looks like this:
First letter of first name + 3rd letter of first name + first letter of last name + 3rd letter of last name + DOB (mmddyy) + indicator of race (1-4) + Letter U
For example, for a person with this data:
First_name Last_name DOB Race
Sarah Smith 09/01/1999 2
I want to create a variable that looks like this:
SRSI0901992U
Any help would be greatly appreciated. Thank you in advance.
I would not recommend using data like that as an ID. Data errors and other problems (not to mention possible duplicates) will render the ID useless. It is technically possible:
data want; set have; length id $100; id=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),put(dob,ddmmyy8.),put(race,1.),"U")); run;
But I really wouldn't recommend it.
I would not recommend using data like that as an ID. Data errors and other problems (not to mention possible duplicates) will render the ID useless. It is technically possible:
data want; set have; length id $100; id=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),put(dob,ddmmyy8.),put(race,1.),"U")); run;
But I really wouldn't recommend it.
I agree that this identifier is sub-optimal, but unfortunately I need to do a datamatch with a database that only has this identifier available.
One quick follow-up question: How do I remove the slashes from the identifier?
I am currently using this code:
data match.mydatabase3;
set match.mydatabase3;
length URN $100;
URN=upcase(cats(substr(first_name,1,1),substr(first_name,3,1),substr(last_name,1,1),substr(last_name,3,1),put(dob,MMDDYY8.),put(Gender_forTLCmatch,1.),"U"));
run;
Which produces this: SRSI09/01/992U
Instead of this (which is what I want): SRSI0901992U
Thanks very much.
Just add a compress:
URN=compress(upcase(cats(substr(first_name,1,1),substr(first_name,3,1),substr(last_name,1,1),substr(last_name,3,1),put(dob,MMDDYY8.),put(Gender_forTLCmatch,1.),"U")),"/");
You didn't say what your real objective is. Maybe this format is prescribed by someone else, and that's just the format you need.
[Edit: now I see you're stuck with the format you asked for...but I'll leave this response for others who have a similar situation.]
BUT, if the real goal is to anonymize these fields such that the personal info is obscured, yet the original value can be checked/verified against this encoded value, then you probably want to use a one-way hash.
SAS supports an industry-standard function (SHA256) that encrypts the value you supply. Assuming you have the original fields stored off somewhere else in a more secure table, you can always re-connect that personal data with the encrypted data because the calculated hash will be the same. But a person with just access to the hash value will not be able to determine the original value.
Something like:
hash = sha256(cats(first_name,last_name,put(dob,ddmmyy8.)));
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.