BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chenzhang
Fluorite | Level 6

I am walking through the Introduction to Enterprise Miner tutorial with Census2000 dataset.  I am using 32-bit edition and the miner is very slow.  For example, when I explore the data, it takes about 4-5 minutes.  I am using it on a laptop with 32GB memory and a Core i7 processor so it is more than sufficient.  I am using SAS 9.4, Windows 8.1 and the most recent update of JRE 1.8.  The JRE at most is using about 4% of CPU time so parallel processing is obviously not taken advantage of by the Enterprise Miner.  Any idea how to speed things up?  This dataset has just 33k tuples so I suspect it can handle any "big data".

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

The data set you are describing is not particularly big and your machine is not particularly incapable.  I would suggest you look at a few things:

 

1 - Where is the data being stored?   If it is on another machine connected via a network or on a USB attached external drive, you could be experiencing issues due to slow I/O.  Data Mining is a memory intensive activity and time lost to I/O can greatly slow down your ability to browse/explore the data.   

 

2 - How full is your hard-drive?   If you have limited disk space, you could be running into resource constraints which are limiting the amount of virtual memory available.  You can also run into issues if you have sufficient RAM but it is being blocked for possible use by other applications.  

 

3 - What is the recommended version of Java?   SAS Enterprise Miner is tested with specific versions of Java.  New Java updates are often not backwards compatible so upgrading Java can actually hurt your performance.  This is often challenging because in many cases, Java is constantly prompting you to update.

 

4 - How old is your project?   Projects which have been in use for sometime can start to perform slower.  Trying to build a new flow in a new project/diagram might improve performance.  

 

5 - How are your variables defined/formatted?   SAS Enterprise Miner normalizes variables to have no more than 32 characters in a the name and no more than 32 characters in the field.    It uses the internal normalized version of the variable for analysis.  You can run into problems separating levels if your variable levels do not differ in the first 32 characters.  You will also find that many exported text data sets have associated formats and/or lengths which are far greater than the field requires (e.g. a Yes/No variable with a length of 200 characters).  It is necessary to allow space for the full formatted length so when fields are unnecessarily long, it can slow processing greatly (even if the field contents are relatively short).  Run the CONTENTS procedure against your data to determine if there are potential issues here.   Any field that is formatted to have a length greater than 32 should be assessed whether the format is necessary (assuming you are not doing text mining).  

 

 

Let me know if any of these potential issues persist in your data/environment.

 

Cordially,

Doug

View solution in original post

5 REPLIES 5
M_EEddlestone
SAS Employee

I'd suggest contacting SAS Technical Support so that they can help you examine the specifics of your environment and help you debug. Technical Support is included in your license and they'd be glad to help!

 

https://support.sas.com/en/technical-support.html

Reeza
Super User

Is EM installed locally?

chenzhang
Fluorite | Level 6

Yes, EM is installed locally.

DougWielenga
SAS Employee

The data set you are describing is not particularly big and your machine is not particularly incapable.  I would suggest you look at a few things:

 

1 - Where is the data being stored?   If it is on another machine connected via a network or on a USB attached external drive, you could be experiencing issues due to slow I/O.  Data Mining is a memory intensive activity and time lost to I/O can greatly slow down your ability to browse/explore the data.   

 

2 - How full is your hard-drive?   If you have limited disk space, you could be running into resource constraints which are limiting the amount of virtual memory available.  You can also run into issues if you have sufficient RAM but it is being blocked for possible use by other applications.  

 

3 - What is the recommended version of Java?   SAS Enterprise Miner is tested with specific versions of Java.  New Java updates are often not backwards compatible so upgrading Java can actually hurt your performance.  This is often challenging because in many cases, Java is constantly prompting you to update.

 

4 - How old is your project?   Projects which have been in use for sometime can start to perform slower.  Trying to build a new flow in a new project/diagram might improve performance.  

 

5 - How are your variables defined/formatted?   SAS Enterprise Miner normalizes variables to have no more than 32 characters in a the name and no more than 32 characters in the field.    It uses the internal normalized version of the variable for analysis.  You can run into problems separating levels if your variable levels do not differ in the first 32 characters.  You will also find that many exported text data sets have associated formats and/or lengths which are far greater than the field requires (e.g. a Yes/No variable with a length of 200 characters).  It is necessary to allow space for the full formatted length so when fields are unnecessarily long, it can slow processing greatly (even if the field contents are relatively short).  Run the CONTENTS procedure against your data to determine if there are potential issues here.   Any field that is formatted to have a length greater than 32 should be assessed whether the format is necessary (assuming you are not doing text mining).  

 

 

Let me know if any of these potential issues persist in your data/environment.

 

Cordially,

Doug

chenzhang
Fluorite | Level 6

Doug:

 

I cleaned about 20gb of space on my SSD which is c: system drive since it is quite full.  The EM is quicker by about 50%.  I did not expect it to use virtual memory since I thought I had plenty of physical memory but unsupervised learning could do that depending on the implementation.  I know Apache Spark do most in memory but maybe EM is different. Now I have your guideline I can look into a few more things to see if they will improve performance further.

 

Thanks,

Chen

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 3400 views
  • 1 like
  • 4 in conversation