Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Text Search and Word Count using SAS Code

Reply
N/A
Posts: 0

Text Search and Word Count using SAS Code

Hi,

I've been a SAS user for the last few years but I'm new to Text Mining. At the outset I would like to let you know that I don't have access to SAS Text Miner/SAS Enterprise Miner. I just have SAS EG with me.

The challenge I'm facing is with regards to Text Search and Count from a "comment" field in a market research survey. My goal is to count the number of occurences of words other than prepositions/articles/conjunctions etc. This gives me an idea about what people are trying to convey using the open comments. I need to do this using SAS Code and not any of the Text Mining Software from SAS.

It would help if someone can point to what is the best way to achieve this. What are the steps I should take? Even if simple pointers are given, I can build the code.

Thanks in advance for your help.

Prakash
Occasional Contributor
Posts: 6

Re: Text Search and Word Count using SAS Code

I'm an old Base SAS user and have done some work with the INDEX functions and Macro processing to evaluate text data. Considering EG is your only tool, I would use the "Code Node" and the Base SAS functions & Macros with Data Step programming. Of course, you'll need to find (or build) a database containing the content you are looking for or you can use several other methods. But, I would start with the basics and build from your research. The SAS Online Docs for Base SAS would be very helpful in this regard.
SAS Employee
Posts: 30

Re: Text Search and Word Count using SAS Code

Prakash,

This would be much easier with Text Miner because it can distinguish when terms are being used as prepositions/articles/conjunctions etc. rather than being purely string based. I am sure your entire analysis would benefit from other features of Text Miner as well.

On the Base SAS side there are many string functions. One thought is just to write out every space delimited term for each doc and then use proc freq. There are some examples of writing out individual terms using the SCAN function on this web page
http://support.sas.com/documentation/cdl/en/lrdict/61724/HTML/default/a000214639.htm

SAS also has a relatively new hash object that allows you to accumulate counts inside the data step if you would like to avoid the proc freq call.

Russ
Ask a Question
Discussion stats
  • 2 replies
  • 382 views
  • 0 likes
  • 3 in conversation