05-07-2017 06:30 AM
I wonder if anyone can please help me? I am looking to perform n-gram analysis using SAS EM.
I have few datasets couple of them have important text and I applied Text Mining nodes particularly Text Parsing, Text Filter etc.
I am looking to know whether n-gram analysis is a part of any particular node or is there any other way I should do it.
05-22-2017 08:19 AM
You can apply ngrams via the fcmp procedure. It's been a while since I've used EM and can't remember if it is included in it or not. The below sample implements a simple ngram algorithm.
proc fcmp outlib=work.dq.func; function ngram(string1 $,string2 $,len); s1 = upcase(compress(string1,,'kan')); s2 = upcase(compress(string2,,'kan')); score=0; do index = 1 to (length(s1)-1); if find(s2,substr(s1,index,len)) then score+1; end; do index = 1 to (length(s2)-1); if find(s1,substr(s2,index,len)) then score+1; end; score = score/2; score_pct = score / (max(length(s1)-1,length(s2)-1)); return(score_pct); endsub; run; options cmplib=work.dq; data tests; length s1 s2 $50; infile datalines dsd dlm='|'; input s1 $ s2 $; ngram = ngram(s1,s2,2); cards; Acme Inc.| Acme Integrated Technologies Acme | Acme Inc. Acme | Acme Smith,John| John Smith run;
a month ago