05-29-2015 10:33 AM
I think the default for the Minimum Number of Documents comes in at 4 when I create a Text Filter node. This is clearly too low when I am dealing with millions of documents. I am in the process of experimenting with what this number should ideally be.
Now I am sort of hooked on the Text Rule Builder Node as well. The main information I hope to retrieve from the Text Miner are words and/or phrases that I should dichotomize and put in as predictors for an eventual Decision Tree. I fully understand that the Text Miner might be most useful in terms of creating Factors or Clusters, but I am hoping to use it as a stepping stone for my ultimate Decision Tree.
As part of my output from the Text Rule Builder I get:
I do not understand how the Total denominator in the last column of my output can be less than my minimum number of documents. I do see a direct association between the number of rules that come out and the minimum number of documents - but I am looking for a little precision and understanding. Anything to help me and the community here is highly valued and appreciated.