I'm new to SAS and I need help with the scenario below. The goal is to produce the "Desired result" shown below. The code I have below produces the "Result" shown below.I'll prefer the use of data step as supposed to proc sql but I'm open to any solutions. Thanks
data testB;
set testA;
by tax_id firstname dateofbirth gender zipcode lastname;
retain unique_ID 0;
if (first.tax_id or first.firstname or first.dateofbirth or first.gender or first.lastname) then ID = ID + 1;
keep tax_id firstname dateofbirth gender zipcode lastname ID;
run;
Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Does 2
Johns 01/01/1973 Male 99999 Doe 3
1111 John Male 99999 Doe 4
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Does 1
John 01/01/1973 Male 99999 Doe 1
1111 John Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
: Did you show your desired result correctly? I would have thought that you would want:
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
Thanks for the reply. The ideal desired result is shown below (it's what Art has suggested) but I would like to preserve the originality of the existing/original record. The ID is the newly created variable that is supposed to uniquely identify a person
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
I would like to create an code that will, before assigning a new ID, check the TAX_ID, firstname,dateofbirth,gender, zipcode,lastname based on the following conditions
if TAX_ID (of incoming record)=Tax_ID (of existing record)
and (firstname (of incoming record)=firstname (of existing record) OR partial match firstname(of incoming record)=firstname (of existing record))
and dateofbirth (of incoming record)= dateofbirth (of existing record)
and gender (of incoming record)= gender (of existing record)
and lastname (of incoming record)=lastname (of existing record) OR partial match lastname(of incoming record)=lastname (of existing record)) THEN assign the existing ID; ELSE assign a NEW ID
Note to Art: I'm open to a solution that assumes that the desired output is :
Desired Result
tax_id firstname dateofbirth gender zipcode lastname ID
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
1111 John 01/01/1973 Male 99999 Doe 1
2222 Bill 03/01/2012 Male 22222 Smith 5
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.