Text mining and content categorization

How to perform n-gram analysis using Enterprise miner ?

Reply
Contributor
Posts: 36

How to perform n-gram analysis using Enterprise miner ?

Hi,

I wonder if anyone can please help me? I am looking to perform n-gram analysis using SAS EM.

I have few datasets couple of them have important text and I applied Text Mining nodes particularly Text Parsing, Text Filter etc.

I am looking to know whether n-gram analysis is a part of any particular node or is there any other way I should do it.

 

Kind regards

 

New Contributor
Posts: 4

Re: How to perform n-gram analysis using Enterprise miner ?

Hi,

You can apply ngrams via the fcmp procedure.  It's been a while since I've used EM and can't remember if it is included in it or not.  The below sample implements a simple ngram algorithm.

 

proc fcmp outlib=work.dq.func;
     function ngram(string1 $,string2 $,len);
     s1 = upcase(compress(string1,,'kan'));
     s2 = upcase(compress(string2,,'kan'));
     score=0;
     do index = 1 to (length(s1)-1);
           if find(s2,substr(s1,index,len)) then score+1;
     end;
     do index = 1 to (length(s2)-1);
           if find(s1,substr(s2,index,len)) then score+1;
     end;
     score = score/2;
     score_pct = score / (max(length(s1)-1,length(s2)-1));
     return(score_pct);
     endsub;
run;
 
options cmplib=work.dq;
data tests;
     length s1 s2 $50;
     infile datalines dsd dlm='|';
     input s1 $ s2 $;
     ngram = ngram(s1,s2,2);
cards;
Acme Inc.| Acme Integrated Technologies
Acme | Acme Inc.
Acme | Acme
Smith,John| John Smith
run;
Contributor
Posts: 36

Re: How to perform n-gram analysis using Enterprise miner ?

Hi Foobarbaz, thanks for your reply, could you please tell me how can I run this code in EM? and do I need to attach my data partition or file import nodes with it??
Ask a Question
Discussion stats
  • 2 replies
  • 133 views
  • 0 likes
  • 2 in conversation