BookmarkSubscribeRSS Feed
geniusgenie
Obsidian | Level 7

Hi,

I wonder if anyone can please help me? I am looking to perform n-gram analysis using SAS EM.

I have few datasets couple of them have important text and I applied Text Mining nodes particularly Text Parsing, Text Filter etc.

I am looking to know whether n-gram analysis is a part of any particular node or is there any other way I should do it.

 

Kind regards

 

2 REPLIES 2
foobarbaz
Obsidian | Level 7

Hi,

You can apply ngrams via the fcmp procedure.  It's been a while since I've used EM and can't remember if it is included in it or not.  The below sample implements a simple ngram algorithm.

 

proc fcmp outlib=work.dq.func;
     function ngram(string1 $,string2 $,len);
     s1 = upcase(compress(string1,,'kan'));
     s2 = upcase(compress(string2,,'kan'));
     score=0;
     do index = 1 to (length(s1)-1);
           if find(s2,substr(s1,index,len)) then score+1;
     end;
     do index = 1 to (length(s2)-1);
           if find(s1,substr(s2,index,len)) then score+1;
     end;
     score = score/2;
     score_pct = score / (max(length(s1)-1,length(s2)-1));
     return(score_pct);
     endsub;
run;
 
options cmplib=work.dq;
data tests;
     length s1 s2 $50;
     infile datalines dsd dlm='|';
     input s1 $ s2 $;
     ngram = ngram(s1,s2,2);
cards;
Acme Inc.| Acme Integrated Technologies
Acme | Acme Inc.
Acme | Acme
Smith,John| John Smith
run;
Regards,
Cameron | Selerity
geniusgenie
Obsidian | Level 7
Hi Foobarbaz, thanks for your reply, could you please tell me how can I run this code in EM? and do I need to attach my data partition or file import nodes with it??

Catch up on SAS Innovate 2026

Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.

Explore Now →
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3407 views
  • 0 likes
  • 2 in conversation