08-07-2015 10:39 AM
I've recently been trying to run Enterprise Miner on some data to create some models. The data I am trying to work with will hopefully be combined with other data in the future when I create the final model(s).
However, I've been attempting to run the default cluster node on the data set, and it's ran for more than a day so far and doesn't show any sign of finishing soon. The data set is ~22.7 million rows, is this time expected for a data set of this size?
Idealy this isn't all the data either, but currently the process is currently prohibitively long for my use, nevermind using the complete data set.
On that note I took a subset of the full data (~117k rows) to play around with, creating some decision trees.
I tried to view the actual tree that the model comparison selected as the best, but the tree seems to be too large for EM to handle well, as trying to view any form of the results takes 10+ min to update.
Is it possible to export the results as text, pdf or picture to view in another program?
08-07-2015 12:01 PM
I'd consider contacting tech support. SAS should handle that size of data in my opinion. I haven't tried EM on anything that large yet but will be soon, so hopefully it doesn't have those issues.
08-07-2015 12:12 PM
High Performance Data Mining is the optimized way of analyzing 22.7 million observations. Enterprise Miner 12.3, 13.1, 13.2, and 14.1 have specific nodes that run hpprocs. For example HPCluster node uses PROC HPCLUS, which can also take advantage of a grid, distributed environment.
Touch base with Tech Support to confirm that your system is well suited to handle your large data sets.
In the meantime, you can see a static summary of your tree if you connect a Reporter node to your flow. By default it generates a PDF with the relevant results from your diagram flow.
I hope this helps!
08-07-2015 03:03 PM
When you say HPCluster node, are you referring to a node separate to the Cluster node? Or is it the Cluster node with specific settings for HPDM?
How do I use or access HPDM nodes? I haven't seen any reference to them in EM or elsewhere, are they an addon?
Regarding the reporter node, it encounters an error (sys error 20002) whenever I try to generate the report. Any idea on the cause of this?
Thanks for your help!
@Reeza I'll let you know what I find out regarding the datasize
@Jaap and @MiguelMaldonado
I apologize, I missed that the HP nodes are for 12.3 and higher. I'm running 12.1 unfortunately, so I guess I'm unable to make use of them.
Are the normal nodes still suited to handling datasets of this size?
08-12-2015 08:29 AM
Would you be able to provide any insight into expected processing time for datasets of 22.7 million rows with non HP nodes?
As well, I've managed to get the reporter node to create PDF's now (not sure how, just started working) but it only displays a small segment of the tree. Do you have any ideas on how to fix this?
12-09-2015 09:57 AM
I'd be happy to run something specific and report on my timing. For example if you tell me run X nodes on Y data set. The runtime for nodes will vary according to the data.
It won't be an apples to apples comparison. The machine where I have SAS installed is pretty powerful.
If your original problem is that you cannot visualize your results, you should seriously consider touching base with Tech Support. Use one of this forms: http://support.sas.com/ctx/supportform/createForm . They will get you up and running in no time!
Let me know if I can help!
08-07-2015 03:08 PM
What's New in SAS(R) 9.4 Eminer 12.3 a licensing change:
"All of the high-performance data mining nodes are now available (at no additional licensing fee) for threaded parallel processing on your existing SAS Enterprise Miner desktop or server. High-performance k-means clustering and decision tree nodes have been added to SAS High-Performance Data Mining."
08-08-2015 02:33 AM
That note of a license change is a sneaky one. As miner is having less value not being up to date that way my suggestion is talk to SAS sales./ Account Manager. You have the right for a free update SAS 9.4 stat 14.1 with all being included. Those upgrades are mostly problematic as of the way SAS has implemented the SAS system and not being aligned to common SDLCM policies (mount points, rpm , isolated data/code) causing a lot of ICT headaches. In my opinion you could ask a 12.1 HP license for free so you could continue you work. Upgrading 99.4) should be planned but can be asking a lot of time/budget for that in getting an aligned installation.
08-10-2015 10:16 AM
Unfortunately due to a combination of being a summer student and the IT red tape of where I'm working, I doubt I'll be able to get the upgrade sorted in time.
Do you know what dataset size I should attempt to stay within?
08-10-2015 02:55 PM
Sorry that kind of information on sizing is difficult. There is something as a trade offs with sizing/capacity and processing time. May be MiguelMaldonado?
Most of it is gone behind the node curtains.