BookmarkSubscribeRSS Feed
manishiiita
Quartz | Level 8

I have free text Narrative fields. I have to scan multiple keywords in this Free text field and find out strings (Partially or Exact Matched).

 

For Example:

 

Narrative Field: 

SUPPLY OF MANPOWER AS PER PROFORMA INVOICE DATED 14.01.2019
1. SIGNED COMMERCIAL INVOICE(S) IN 1 ORIGINAL AND 2 COPIES
SHOWING DATE OF SUPPLY OF MANPOWER NOT LATER THAN 15.05.2019 AND
DULY COUNTERSIGNED BY APPLICANTS AUTHORIZED SIGNATORY AND TO BE
AUTHENTICATED BY IRAN NATIONAL BANK, ERAN TRADE FINANCE DEPARTMENT
PRIOR PRESENTATION OF DOCS FOR NEGOTIATION.
IN THE ABSENCE OF DATE OF SUPPLY OF MANPOWER, THE DATE SHOWN ON
COMMERCIAL INVOICE WILL BE CONSIDERED AS THE SUPPLY DATE for DAWOOD HASSAN.

USD 120/- OR EQUIVALENT IN THE L/C CURRENCY AND RELATED
CHARGES SHOULD BE DEDUCTED FROM THE PAYMENT FOR EACH PRESENTATION by DAWOOD HASAN
OF DISCREPANT DOCUMENTS UNDER THIS CREDIT, NOT WITHSTANDING ANY
INSTRUCTION TO THE CONTRARY, THIS CHARGE SHALL BE FOR THE ACCOUNT
OF BENEFICIARY

2. BENEFICIARYS A/C NO.: 202-577688-001-0010-000  BIC: PIBPBG2L
APPLICANT ACCOUNT. ALL OTHER
CHARGES INCLUDING REIMBURSEMENT AND
SWIFT PAYMENTS RELATED CHARGES ARE
FOR BENEFICIARY ACCOUNT in SYRIA
WITHOUT DESPATCH FULL SET OF PRESENTED / NEGOTIATED DOCUMENTS IN ONE LOT
BY COURIER TO: QATAR NATIONAL BANK, MAIN OFFICE, GRAND HAMAD
STREET, TRADE FINANCE DEPARTMENT, IMPORTS SECTION, P.O. BOX 1000,
DOHA, QATAR.
++UPON RECEIPT OF CREDIT COMPLYING DOCUMENTS OSMA BIN LADEN PAYMENT SHALL BE
EFFECTED BY US AS PER PRESENTING BANKS INSTRUCTION.

 

I have to scan above Narrative field and find out list of keywords given below

 

PIBPBG2L

OSAMA BIN LADEN

DAWOOD HASSAN

SYRIA

IRAN

 

The scanning should give Partially or Exact matched strings in above Narrative field. Here, the output will be:

 

Matched strings for PIBPBG2L: PIBPBG2L (Exact Match)

Matched strings for OSAMA BIN LADEN: OSMA BIN LADEN (Partial Match)

Matched strings for DAWOOD HASSAN: DAWOOD HASSAN (Exact Match) & DAWOOD HASAN (Partial Match)

Matched strings for SYRIA: SYRIA (Exact Match)

Matched strings for IRAN: IRAN (Exact Match) & ERAN (Partial Match)

 

Please note, I have to scan more ~1 Million such keywords in Real Time. So, please suggest for approach which could be faster. I am not using Like Wise match as this will be very slow for 1 Million records, also it will show exact matches only. To give you further details, I have to scan complete World Check Names in above Narrative field with Exact or Partial Matching.

3 REPLIES 3
JackHamilton
Lapis Lazuli | Level 10

I would probably use the prxmatch function, but the problem doesn't seem well defined to me.  What do you mean by "partial match"?  If "Hasan" matches "Hassan", why wouldn't "as" also match?  Or "sanitary"?

 

One of the spelling difference functions might be useful, but would probably be very expensive.

 

Maybe one of the Dataflux products would do this if you have it licensed.

 

 

 

TomKari
Onyx | Level 15

I agree completely with @JackHamilton, you should look at either DataFlux or the SAS Text Analysis products.

 

However, I had to do a simple version of this once, and it was easy to adapt my code for your data. Here's a test bench version of what you need that you can play around with.

 

Tom

data RawText;
	length TextStr $32767;
	input;
	TextStr = _infile_;
	LineNum = _n_;
	cards4;
SUPPLY OF MANPOWER AS PER PROFORMA INVOICE DATED 14.01.2019
1. SIGNED COMMERCIAL INVOICE(S) IN 1 ORIGINAL AND 2 COPIES
SHOWING DATE OF SUPPLY OF MANPOWER NOT LATER THAN 15.05.2019 AND
DULY COUNTERSIGNED BY APPLICANTS AUTHORIZED SIGNATORY AND TO BE
AUTHENTICATED BY IRAN NATIONAL BANK, ERAN TRADE FINANCE DEPARTMENT
PRIOR PRESENTATION OF DOCS FOR NEGOTIATION.
IN THE ABSENCE OF DATE OF SUPPLY OF MANPOWER, THE DATE SHOWN ON
COMMERCIAL INVOICE WILL BE CONSIDERED AS THE SUPPLY DATE for DAWOOD HASSAN.

USD 120/- OR EQUIVALENT IN THE L/C CURRENCY AND RELATED
CHARGES SHOULD BE DEDUCTED FROM THE PAYMENT FOR EACH PRESENTATION by DAWOOD HASAN
OF DISCREPANT DOCUMENTS UNDER THIS CREDIT, NOT WITHSTANDING ANY
INSTRUCTION TO THE CONTRARY, THIS CHARGE SHALL BE FOR THE ACCOUNT
OF BENEFICIARY

2. BENEFICIARYS A/C NO.: 202-577688-001-0010-000  BIC: PIBPBG2L
APPLICANT ACCOUNT. ALL OTHER
CHARGES INCLUDING REIMBURSEMENT AND
SWIFT PAYMENTS RELATED CHARGES ARE
FOR BENEFICIARY ACCOUNT in SYRIA
WITHOUT DESPATCH FULL SET OF PRESENTED / NEGOTIATED DOCUMENTS IN ONE LOT
BY COURIER TO: QATAR NATIONAL BANK, MAIN OFFICE, GRAND HAMAD
STREET, TRADE FINANCE DEPARTMENT, IMPORTS SECTION, P.O. BOX 1000,
DOHA, QATAR.
++UPON RECEIPT OF CREDIT COMPLYING DOCUMENTS OSMA BIN LADEN PAYMENT SHALL BE
EFFECTED BY US AS PER PRESENTING BANKS INSTRUCTION.
;;;;
run;

data CompText;
	length CompStr $50;
	input;
	CompStr = _infile_;
	CompStr = upcase(CompStr);
	cards;
PIBPBG2L
OSAMA
BIN
LADEN
DAWOOD
HASSAN
SYRIA
IRAN
run;

data RawTextProcess;
	set RawText;
	TextStr = translate(TextStr, "                                ", "`~!@#$%^&*()-=_+[]\{}|;':"",./<>?");
	TextStr = upcase(left(compbl(TextStr)));
run;

data DeString;
	length RawWord $25;
	set RawTextProcess;
	drop TextStr;

	do WordNum = 1 to countw(TextStr);
		RawWord = scan(TextStr, WordNum);
		output;
	end;
run;

proc sql noprint;
	create table Compare as
		select c.CompStr, d.RawWord, d.LineNum, d.WordNum, compged(c.Compstr, d.RawWord) as CompGedResult, complev(c.Compstr, d.RawWord) as CompLevResult, spedis(c.Compstr, d.RawWord) as SpeDisResult
			from CompText c cross join DeString d
				order by CompLevResult;
quit;
SASKiwi
PROC Star

To me, this looks like you are dealing with AML (Anti-Money Laundering) requirements that are specified in law by many governments. SAS has specific products for this, as do other vendors. If so, then providing more context to what you are doing would be helpful.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1142 views
  • 0 likes
  • 4 in conversation