Fuzzy/Score set in data step

Reply
Occasional Contributor
Posts: 11

Fuzzy/Score set in data step

I'm new to SAS and I need help with the scenario below. The goal is to produce the "Desired result" shown below. The code I have below produces the "Result" shown below.I'll prefer the use of data step as supposed to proc sql but I'm open to any solutions.  Thanks

data testB;

set testA;

by tax_id firstname dateofbirth gender zipcode lastname;

retain unique_ID 0;

if (first.tax_id or first.firstname or first.dateofbirth or first.gender or first.lastname) then ID = ID + 1;

keep tax_id firstname dateofbirth gender zipcode lastname ID;

run;

Result

tax_id firstname dateofbirth gender zipcode lastname ID

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Does    2     

          Johns    01/01/1973  Male  99999    Doe      3

1111  John                        Male  99999      Doe      4

1111  John      01/01/1973  Male  99999    Doe       1

1111  John      01/01/1973  Male  99999     Doe      1

2222  Bill      03/01/2012    Male  22222     Smith    5

Desired Result

tax_id firstname dateofbirth gender zipcode lastname ID

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Does    1

          John     01/01/1973  Male  99999    Doe      1

1111  John                        Male  99999    Doe       1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

2222  Bill       03/01/2012   Male  22222   Smith    5

PROC Star
Posts: 7,363

Re: Fuzzy/Score set in data step

: Did you show your desired result correctly?  I would have thought that you would want:

Desired Result

tax_id firstname dateofbirth gender zipcode lastname ID

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

2222  Bill        03/01/2012  Male  22222    Smith    5

Occasional Contributor
Posts: 11

Re: Fuzzy/Score set in data step

Thanks for the reply. The ideal desired result is shown below (it's what Art has suggested) but I would like to preserve the originality of the existing/original record. The ID is the newly created variable that is supposed to uniquely identify a person

Desired Result

tax_id firstname dateofbirth gender zipcode lastname ID

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

2222  Bill        03/01/2012  Male  22222    Smith    5

I would like to create an code that will, before assigning a new ID, check the TAX_ID, firstname,dateofbirth,gender, zipcode,lastname based on the following conditions

if TAX_ID (of incoming record)=Tax_ID (of existing record)

and (firstname (of incoming record)=firstname (of existing record)  OR partial match firstname(of incoming record)=firstname (of existing record))

and  dateofbirth (of incoming record)= dateofbirth (of existing record)

and gender (of incoming record)= gender (of existing record)

and lastname (of incoming record)=lastname (of existing record)  OR partial match lastname(of incoming record)=lastname (of existing record)) THEN assign the existing ID; ELSE assign a NEW ID

Note to Art: I'm open to a solution that assumes that the desired output is :

Desired Result

tax_id firstname dateofbirth gender zipcode lastname ID

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

1111  John      01/01/1973  Male  99999    Doe      1

2222  Bill        03/01/2012  Male  22222    Smith    5

Ask a Question
Discussion stats
  • 2 replies
  • 205 views
  • 0 likes
  • 2 in conversation