BookmarkSubscribeRSS Feed
geniusgenie
Obsidian | Level 7

Hi,

I wonder if anyone can please help me? I am looking to perform n-gram analysis using SAS EM.

I have few datasets couple of them have important text and I applied Text Mining nodes particularly Text Parsing, Text Filter etc.

I am looking to know whether n-gram analysis is a part of any particular node or is there any other way I should do it.

 

Kind regards

 

2 REPLIES 2
foobarbaz
Obsidian | Level 7

Hi,

You can apply ngrams via the fcmp procedure.  It's been a while since I've used EM and can't remember if it is included in it or not.  The below sample implements a simple ngram algorithm.

 

proc fcmp outlib=work.dq.func;
     function ngram(string1 $,string2 $,len);
     s1 = upcase(compress(string1,,'kan'));
     s2 = upcase(compress(string2,,'kan'));
     score=0;
     do index = 1 to (length(s1)-1);
           if find(s2,substr(s1,index,len)) then score+1;
     end;
     do index = 1 to (length(s2)-1);
           if find(s1,substr(s2,index,len)) then score+1;
     end;
     score = score/2;
     score_pct = score / (max(length(s1)-1,length(s2)-1));
     return(score_pct);
     endsub;
run;
 
options cmplib=work.dq;
data tests;
     length s1 s2 $50;
     infile datalines dsd dlm='|';
     input s1 $ s2 $;
     ngram = ngram(s1,s2,2);
cards;
Acme Inc.| Acme Integrated Technologies
Acme | Acme Inc.
Acme | Acme
Smith,John| John Smith
run;
Regards,
Cameron | Selerity
geniusgenie
Obsidian | Level 7
Hi Foobarbaz, thanks for your reply, could you please tell me how can I run this code in EM? and do I need to attach my data partition or file import nodes with it??

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2131 views
  • 0 likes
  • 2 in conversation