Hi,
I wonder if anyone can help me?
I have a text file with only two columns and number of rows are only 2000 but second column has huge number of characters mixed with symbols and everything size even larger than 1 million characters. Because of the nature of domain I cannot remove punctuations etc and when use File Import node in EM it does not load all records and gives error that observation size should be less than 32767.
Could anyone tell me any way around to load that whole dataset in SAS so that I can apply Text Mining on that.
Regards
The SAS Text Miner Text Import node is not designed to read one file having many documents. It is designed to read many separate document files. My program converts the one file described above with the large (greater than 32,767) character field into many files that can then be processed using the Text Import node. The SAS Enterprise Miner File Import node will not work because of the buffer limit, which is described in the original post. You are correct in that the Text Import node has no (documented) limit on text size. If you are using another product, like SAS Contextual Analysis, you can also use my solution, because SAS Contextual Analysis will accomodate an input data source that has a variable containing the path to a document that may have more than 32,767 characters.
I have tested my solution with the Text Import node, and it appears to work.
I have attached a SAS program which may be useful.
Regards,
Terry Woodfield
terry.woodfield@sas.com
You are trying to load the data in SAS and you are hit by the 32767 limitation. You dont have to load the data into SAS (due to the large size) use the Text Import node, Import File Directory (which is the path to where the files are stored usually as seperate text files).
The SAS Text Miner Text Import node is not designed to read one file having many documents. It is designed to read many separate document files. My program converts the one file described above with the large (greater than 32,767) character field into many files that can then be processed using the Text Import node. The SAS Enterprise Miner File Import node will not work because of the buffer limit, which is described in the original post. You are correct in that the Text Import node has no (documented) limit on text size. If you are using another product, like SAS Contextual Analysis, you can also use my solution, because SAS Contextual Analysis will accomodate an input data source that has a variable containing the path to a document that may have more than 32,767 characters.
I have tested my solution with the Text Import node, and it appears to work.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.