BookmarkSubscribeRSS Feed
Mike90
Quartz | Level 8

I have 68,000 rows of data, and I am trying to generate descriptive statistics for the full data set from within EM, rather than using different software.

 

I changed EM_EXPLOREOBS_MAX from 20,000 to 80,000, and this has no effect on the Graph Explore node.

 

I read the Help available from the program, and the only variable I can find that may be relevant is EM_EXPLOREOBS_MAX, and there are no other relevant variables in the Project Macro Variables window.  

 

 

 

3 REPLIES 3
Mike90
Quartz | Level 8

I added " %let EM_EXPLOREOBS_MAX=50000; " in an SAS Code Node in between the Data Source Node and the Graph Explore Node, and now the node is using all the rows. 

 

I don't know why the setting in the Project Macro Variables window is being ignored.  I don't see anything in the Data Source node that is setting that value back to the default of 20,000, and no other nodes are involved.

 

 

 

Mike90
Quartz | Level 8

Now the problem is that Graph Explore is only creating graphs for the target variables.  I have the Data Source set to raw.  Why is this happening?  I need those same graph for the input variables.

 

 

DougWielenga
SAS Employee

Mike90, 

 

The thing to remember is that SAS Enterprise Miner is designed for data mining (potentially massive) data sets so things that might be done for small data sets do not scale.  There are often hundreds if not thousands of variables in these data sets so looking at the correlation across all inputs is just not feasible.   There is the Stat Explore node to look at individual variables as well.   Since the data might be coming from distributed systems, it is not possible to bring all the data locally. Data Mining is about finding patterns in large complex data sets so graphing individual points and trying to assess relationships one at a time is not necessarily going to assist that goal.   In several places, SAS Enterprise Miner brings a small portion of the data locally (configured differently depending on the place the data is to be viewed/used) to allow some discovery but this is not the primary approach to analyzing large complex data sets.   The Project options you mentioned control how many observations are brought back from the observations that are visible when you view the Imported or Exported data sets. 

 

SAS Enterprise Miner limits how much data is brought back though due to the impact on performance.  You specify the maximum number of observations but it is controlled by the actual size of the sample on the disk.   If you have many extremely wide fields, you will see fewer observations than if you have only numeric data, for instance.  In the end, the compute tier will analyze all the data and the portion brought back for viewing is just for information and not intended to be the primary means of discovery.   I might be able to suggest some strategies if you could let me know what you were hoping to learn from the sample.  

 

Hope this helps!

Doug

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2307 views
  • 0 likes
  • 2 in conversation