BookmarkSubscribeRSS Feed
fgfgsgthtgrgrht
Calcite | Level 5

I'm using text mining analyses of SAS Enterprise Miner 15.1.

 

I wonder if we can drop the "text variable" in the "texttopic node", because it always spends a very long period of time for copying the "text variable" from the dataset.

 

The dataset is around 30GB, the largest size variable belongs to the "text variable", because I'm parsing text data.

 

If there is a way to drop this variable before creating the texttopic node, that will significantly reduce the required storage space and processing time.

 

I think it's redundant to copy the "text variable" in the texttopic node, because the terms have been parsed by the "textparsing" and "textfilter" nodes?

 

my standard procedure of performing text mining analysis:

connect these nodes: sas dataset (with text variable and ID) & textparsing node (with startlist and other settings) & textfilter node (with synonym list and dictionary) & texttopic node (create around 200 single-term topics, no multi-term topic)

 

Thank you so much!!

3 REPLIES 3
SASJedi
SAS Super FREQ
Moved to Analytics -> Data Mining forum for better visibility.
Check out my Jedi SAS Tricks for SAS Users
fgfgsgthtgrgrht
Calcite | Level 5

Another related issue is, how to increase the memory and CPU use for the text mining analysis?

Is it adding a SAS code node with the following syntax?

 

options cpucount=8;
options memsize=1073741824;
options sortsize=1073741824;

 

Thank you

gcjfernandez
SAS Employee
You could try the DROP variable node from the MODIFY Tab in SAS Enterprise minor to drop this txt variable before connecting the Topics node

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 916 views
  • 0 likes
  • 3 in conversation