Text mining and content categorization

How to load data with larger character length for text mining

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

How to load data with larger character length for text mining

Hi, 

I wonder if anyone can help me?

I have a text file with only two columns and number of rows are only 2000 but second column has huge number of characters mixed with symbols and everything size even larger than 1 million characters. Because of the nature of domain I cannot remove punctuations etc and when  use File Import node in EM it does not load all records and gives error that observation size should be less than 32767.

 

Could anyone tell me any way around to load that whole dataset in SAS so that I can apply Text Mining on that.

 

Regards

 


Accepted Solutions
Solution
‎05-21-2017 09:39 AM
SAS Employee
Posts: 2

Re: How to load data with larger character length for text mining

The SAS Text Miner Text Import node is not designed to read one file having many documents. It is designed to read many separate document files. My program converts the one file described above with the large (greater than 32,767) character field into many files that can then be processed using the Text Import node. The SAS Enterprise Miner File Import node will not work because of the buffer limit, which is described in the original post. You are correct in that the Text Import node has no (documented) limit on text size. If you are using another product, like SAS Contextual Analysis, you can also use my solution, because SAS Contextual Analysis will accomodate an input data source that has a variable containing the path to a document that may have more than 32,767 characters.

 

I have tested my solution with the Text Import node, and it appears to work.

View solution in original post


All Replies
SAS Employee
Posts: 2

Re: How to load data with larger character length for text mining

I have attached a SAS program which may be useful.

 

Regards,

 

Terry Woodfield

terry.woodfield@sas.com

 

Attachment
SAS Employee
Posts: 3

Re: How to load data with larger character length for text mining

You are trying to load the data in SAS and you are hit by the 32767 limitation. You dont have to load the data into SAS (due to the large size) use the Text Import node, Import File Directory (which is the path to where the files are stored usually as seperate text files).

Solution
‎05-21-2017 09:39 AM
SAS Employee
Posts: 2

Re: How to load data with larger character length for text mining

The SAS Text Miner Text Import node is not designed to read one file having many documents. It is designed to read many separate document files. My program converts the one file described above with the large (greater than 32,767) character field into many files that can then be processed using the Text Import node. The SAS Enterprise Miner File Import node will not work because of the buffer limit, which is described in the original post. You are correct in that the Text Import node has no (documented) limit on text size. If you are using another product, like SAS Contextual Analysis, you can also use my solution, because SAS Contextual Analysis will accomodate an input data source that has a variable containing the path to a document that may have more than 32,767 characters.

 

I have tested my solution with the Text Import node, and it appears to work.

Contributor
Posts: 36

Re: How to load data with larger character length for text mining

Thanks Terry and sorry for late reply, i will give it a try.
Contributor
Posts: 36

Re: How to load data with larger character length for text mining

Thanks a lot guys, i will follow your suggestions. Lets see if it works.
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 248 views
  • 1 like
  • 4 in conversation