I agree completely with @JackHamilton, you should look at either DataFlux or the SAS Text Analysis products.
However, I had to do a simple version of this once, and it was easy to adapt my code for your data. Here's a test bench version of what you need that you can play around with.
Tom
data RawText;
length TextStr $32767;
input;
TextStr = _infile_;
LineNum = _n_;
cards4;
SUPPLY OF MANPOWER AS PER PROFORMA INVOICE DATED 14.01.2019
1. SIGNED COMMERCIAL INVOICE(S) IN 1 ORIGINAL AND 2 COPIES
SHOWING DATE OF SUPPLY OF MANPOWER NOT LATER THAN 15.05.2019 AND
DULY COUNTERSIGNED BY APPLICANTS AUTHORIZED SIGNATORY AND TO BE
AUTHENTICATED BY IRAN NATIONAL BANK, ERAN TRADE FINANCE DEPARTMENT
PRIOR PRESENTATION OF DOCS FOR NEGOTIATION.
IN THE ABSENCE OF DATE OF SUPPLY OF MANPOWER, THE DATE SHOWN ON
COMMERCIAL INVOICE WILL BE CONSIDERED AS THE SUPPLY DATE for DAWOOD HASSAN.
USD 120/- OR EQUIVALENT IN THE L/C CURRENCY AND RELATED
CHARGES SHOULD BE DEDUCTED FROM THE PAYMENT FOR EACH PRESENTATION by DAWOOD HASAN
OF DISCREPANT DOCUMENTS UNDER THIS CREDIT, NOT WITHSTANDING ANY
INSTRUCTION TO THE CONTRARY, THIS CHARGE SHALL BE FOR THE ACCOUNT
OF BENEFICIARY
2. BENEFICIARYS A/C NO.: 202-577688-001-0010-000 BIC: PIBPBG2L
APPLICANT ACCOUNT. ALL OTHER
CHARGES INCLUDING REIMBURSEMENT AND
SWIFT PAYMENTS RELATED CHARGES ARE
FOR BENEFICIARY ACCOUNT in SYRIA
WITHOUT DESPATCH FULL SET OF PRESENTED / NEGOTIATED DOCUMENTS IN ONE LOT
BY COURIER TO: QATAR NATIONAL BANK, MAIN OFFICE, GRAND HAMAD
STREET, TRADE FINANCE DEPARTMENT, IMPORTS SECTION, P.O. BOX 1000,
DOHA, QATAR.
++UPON RECEIPT OF CREDIT COMPLYING DOCUMENTS OSMA BIN LADEN PAYMENT SHALL BE
EFFECTED BY US AS PER PRESENTING BANKS INSTRUCTION.
;;;;
run;
data CompText;
length CompStr $50;
input;
CompStr = _infile_;
CompStr = upcase(CompStr);
cards;
PIBPBG2L
OSAMA
BIN
LADEN
DAWOOD
HASSAN
SYRIA
IRAN
run;
data RawTextProcess;
set RawText;
TextStr = translate(TextStr, " ", "`~!@#$%^&*()-=_+[]\{}|;':"",./<>?");
TextStr = upcase(left(compbl(TextStr)));
run;
data DeString;
length RawWord $25;
set RawTextProcess;
drop TextStr;
do WordNum = 1 to countw(TextStr);
RawWord = scan(TextStr, WordNum);
output;
end;
run;
proc sql noprint;
create table Compare as
select c.CompStr, d.RawWord, d.LineNum, d.WordNum, compged(c.Compstr, d.RawWord) as CompGedResult, complev(c.Compstr, d.RawWord) as CompLevResult, spedis(c.Compstr, d.RawWord) as SpeDisResult
from CompText c cross join DeString d
order by CompLevResult;
quit;
... View more