I'm new to SAS and I need help with the scenario below. The goal is to produce the "Desired result" shown below. The code I have below produces the "Result" shown below.I'll prefer the use of data step as supposed to proc sql but I'm open to any solutions. Thanks
data testB;
set testA;
by tax_id firstname dateofbirth gender zipcode lastname;
retain unique_ID 0;
if (first.tax_id or first.firstname or first.dateofbirth or first.gender or first.lastname) then ID = ID + 1;
keep tax_id firstname dateofbirth gender zipcode lastname ID;
run;
Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Does 2
Johns 01/01/1973 Male 99999 Doe 3
1111 John Male 99999 Doe 4
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Does 1
John 01/01/1973 Male 99999 Doe 1
1111 John Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
: Did you show your desired result correctly? I would have thought that you would want:
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
Thanks for the reply. The ideal desired result is shown below (it's what Art has suggested) but I would like to preserve the originality of the existing/original record. The ID is the newly created variable that is supposed to uniquely identify a person
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
I would like to create an code that will, before assigning a new ID, check the TAX_ID, firstname,dateofbirth,gender, zipcode,lastname based on the following conditions
if TAX_ID (of incoming record)=Tax_ID (of existing record)
and (firstname (of incoming record)=firstname (of existing record) OR partial match firstname(of incoming record)=firstname (of existing record))
and dateofbirth (of incoming record)= dateofbirth (of existing record)
and gender (of incoming record)= gender (of existing record)
and lastname (of incoming record)=lastname (of existing record) OR partial match lastname(of incoming record)=lastname (of existing record)) THEN assign the existing ID; ELSE assign a NEW ID
Note to Art: I'm open to a solution that assumes that the desired output is :
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.