BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ali1067
Calcite | Level 5

Hi, 

I wonder if anyone can help me?

I have a text file with only two columns and number of rows are only 2000 but second column has huge number of characters mixed with symbols and everything size even larger than 1 million characters. Because of the nature of domain I cannot remove punctuations etc and when  use File Import node in EM it does not load all records and gives error that observation size should be less than 32767.

 

Could anyone tell me any way around to load that whole dataset in SAS so that I can apply Text Mining on that.

 

Regards

 

1 ACCEPTED SOLUTION

Accepted Solutions
TWoodfield
SAS Employee

The SAS Text Miner Text Import node is not designed to read one file having many documents. It is designed to read many separate document files. My program converts the one file described above with the large (greater than 32,767) character field into many files that can then be processed using the Text Import node. The SAS Enterprise Miner File Import node will not work because of the buffer limit, which is described in the original post. You are correct in that the Text Import node has no (documented) limit on text size. If you are using another product, like SAS Contextual Analysis, you can also use my solution, because SAS Contextual Analysis will accomodate an input data source that has a variable containing the path to a document that may have more than 32,767 characters.

 

I have tested my solution with the Text Import node, and it appears to work.

View solution in original post

5 REPLIES 5
TWoodfield
SAS Employee

I have attached a SAS program which may be useful.

 

Regards,

 

Terry Woodfield

terry.woodfield@sas.com

 

ssoti2001
SAS Employee

You are trying to load the data in SAS and you are hit by the 32767 limitation. You dont have to load the data into SAS (due to the large size) use the Text Import node, Import File Directory (which is the path to where the files are stored usually as seperate text files).

TWoodfield
SAS Employee

The SAS Text Miner Text Import node is not designed to read one file having many documents. It is designed to read many separate document files. My program converts the one file described above with the large (greater than 32,767) character field into many files that can then be processed using the Text Import node. The SAS Enterprise Miner File Import node will not work because of the buffer limit, which is described in the original post. You are correct in that the Text Import node has no (documented) limit on text size. If you are using another product, like SAS Contextual Analysis, you can also use my solution, because SAS Contextual Analysis will accomodate an input data source that has a variable containing the path to a document that may have more than 32,767 characters.

 

I have tested my solution with the Text Import node, and it appears to work.

geniusgenie
Obsidian | Level 7
Thanks Terry and sorry for late reply, i will give it a try.
geniusgenie
Obsidian | Level 7
Thanks a lot guys, i will follow your suggestions. Lets see if it works.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2234 views
  • 1 like
  • 4 in conversation