Hi,
I wonder if anyone can please help me? I am looking to perform n-gram analysis using SAS EM.
I have few datasets couple of them have important text and I applied Text Mining nodes particularly Text Parsing, Text Filter etc.
I am looking to know whether n-gram analysis is a part of any particular node or is there any other way I should do it.
Kind regards
Hi,
You can apply ngrams via the fcmp procedure. It's been a while since I've used EM and can't remember if it is included in it or not. The below sample implements a simple ngram algorithm.
proc fcmp outlib=work.dq.func;
function ngram(string1 $,string2 $,len);
s1 = upcase(compress(string1,,'kan'));
s2 = upcase(compress(string2,,'kan'));
score=0;
do index = 1 to (length(s1)-1);
if find(s2,substr(s1,index,len)) then score+1;
end;
do index = 1 to (length(s2)-1);
if find(s1,substr(s2,index,len)) then score+1;
end;
score = score/2;
score_pct = score / (max(length(s1)-1,length(s2)-1));
return(score_pct);
endsub;
run;
options cmplib=work.dq;
data tests;
length s1 s2 $50;
infile datalines dsd dlm='|';
input s1 $ s2 $;
ngram = ngram(s1,s2,2);
cards;
Acme Inc.| Acme Integrated Technologies
Acme | Acme Inc.
Acme | Acme
Smith,John| John Smith
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.