The code below: Suppose there's this combination here giving us duplicate users using the DOB, Email & Visa matching SELECT "C9" AS COMB,
A. NEW _CIF_NO,
A. CUST_NAME,
B. NEW CIF_ NO AS NEW CIF NO 1,
B. CUST_ NAME AS CUST_FULL NAME__1,
COMPGED (A. CUST_NAME, B. CUST_NAME) AS SCORE
FROM ALL_ENT_ CUST_IND BASE A
JOIN ALI_ENT_ CUST_ INDV_BASE B
ON COMPGED (A. CUST_NAME, B. CUST_NAME) <= 1000
AND A. CUST DOB = B. CUST DOB
AND A. MOBILE NUM = B. MOBILE NUM
AND A. DOC VISA = B. DOC VISA
AND A. CUST_ NAME IS NOT NULL gives us the result below, CIF Name Cif_new Name2 Score
EIB~72—775 EASO PHILIP K A EASO ENB~48--88 EASO PHILIP 330
EIB~72—445 JUDE GRAP ENB~48--98 JUDE GRAP 0
EIB~71—775 EASO PHILIP EASO K A ENB~43--88 PHILIP EASO 500 Score 0 is exact name match along with the doc matching, the higher the score the less the name match. After a bunch of combinations they all get merged, and a final dataset is created with all the user names, documents like this CIF Name Cif_new Doc DOB Email
EIB~72—775 EASO PHILIP K A EASO ENB~48--88 A32742 12-JUL-20 AHSDA@GMAIL.COM
EIB~72—445 JUDE GRAP ENB~48---98 B42823 13-AUG-20 GFA@GMAIL.COM
EIB~72—775 EASO PHILIP K A EASO ENB~436-88 C91742 17-FEB-21 UADFN@GMAIL.COM This dataset here contains score 0, score 330 & score 500 etc all together. What Im trying to do is find a way to create a subset-dataset which would have only those users that even if the names don't match exactly but their documents match. Basically no false positive (like names that match but no matching documents) but just true positive. Let me know if more clarification needed!
... View more