Text mining and content categorization

Tips for efficiently mining excel files that have text data in it

Posts: 22

Tips for efficiently mining excel files that have text data in it



I'm working on an excel that has multiple text fields of varying lengths (some fields have a few sentences and some have few paragraphs in it). Each record contains information/observations pertaining to a specific industry and each field uniquely indentifies certain predefinied charecteristic pertaining to that industry. I'm looking for ways to explore this dataset, identify those features specific to the type of record(each record can be categorized into 3 groups which is also available as one of the text fields in the dataset). I was trying different means to mine this text data and ran into several questions in the process.


Using file import I brought the dataset into sas and after parsing I noticed that only one field that has longest width is chosen as the attribute under observation and rest of them are ignored(I couldnt find them in text filter node). But I wanted to include terms from other fileds as well(Merging those fields is not an option as each field has its own unique charecteristic as mentioned above). Text topic and clustering is giving more generic information which I dont think add much value to the knowledge discovery process. What is a way to effectively mine this text data so that the least possible information is compromised through mining?


I couldnt share the data due to security issues but if you have further questions on dataset being used, feel free to update the thread and I will be glad to add information to it. I'm a newbie in this field so any kind of help is very much appreciated. Thank you.

Posts: 57

Re: Tips for efficiently mining excel files that have text data in it

Posted in reply to Bhuvaneswari

Please review this link. Definitely you can get through.




FILENAME SASCONF DDE 'Excel|[Book1]Sheet1!R1C1:R9C4'; /* DDE EXAMPLE - Excel */

DATA SasConf;


INPUT ConfName $ ConfYear ConfCity $ ConfST $ ;


Ask a Question
Discussion stats
  • 1 reply
  • 2 in conversation