Hello,
I am trying to create a unique identifier using parts of first and last name, DOB and Gender. I want the first 2 letters of the first_name & the first 2 letters of the last_name & last 2 letters of the last_name & DOB with no slashes & Gender as either M or F.
For example:
First_name Last_name DOB Gender ID_number
John Smith 01/02/1995 M JOSMTH01021995M
Mary Jackson 03/15/1975 F MAJAON03151975F
Any help would be greatly appreciated!!
data want;
set have;
id = cats(substr(first_name,1,2),substr(last_name,1,2),
reverse(substr(reverse(trim(last_name)),1,2)),compress(dob,'/'),gender);
run;
Assuming DOB is an actual numeric SAS date value and not a character string
data want;
set have;
id = cats(substr(first_name,1,2),substr(last_name,1,2),
reverse(substr(reverse(trim(last_name)),1,2)),put(dob,mmddyyn8.),gender);
run;
If the last name has less than 4 letters, such as the name "Ty Law", then your ID would begin with TyLaaw (the letter A being repeated by this algorithm)
Thank you! my date is a character string. What is the format to use so that there are no / in the id?
data want;
set have;
id = cats(substr(first_name,1,2),substr(last_name,1,2),
reverse(substr(reverse(trim(last_name)),1,2)),compress(dob,'/'),gender);
run;
I would NOT call this a "unique id". In one data set I work with of about 5000 individuals I had two people with the same first and last name, birth date and gender. I knew they were not the same person from other data such as residence.
Since Joel Smutny would get the same letters as John Smith, your algorithm is very likely to have issues with the "uniqueness" if the number of people involved is not even very large.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.