Re: Getting Graph Explore to use more than 20,000 observations (EM_EXP...

Mike90 · Posted 10-19-2017 03:34 PM

I have 68,000 rows of data, and I am trying to generate descriptive statistics for the full data set from within EM, rather than using different software.

I changed EM_EXPLOREOBS_MAX from 20,000 to 80,000, and this has no effect on the Graph Explore node.

I read the Help available from the program, and the only variable I can find that may be relevant is EM_EXPLOREOBS_MAX, and there are no other relevant variables in the Project Macro Variables window.

Mike90 · Posted 10-19-2017 03:53 PM

I added " %let EM_EXPLOREOBS_MAX=50000; " in an SAS Code Node in between the Data Source Node and the Graph Explore Node, and now the node is using all the rows.

I don't know why the setting in the Project Macro Variables window is being ignored. I don't see anything in the Data Source node that is setting that value back to the default of 20,000, and no other nodes are involved.

Mike90 · Posted 10-19-2017 07:10 PM

Now the problem is that Graph Explore is only creating graphs for the target variables. I have the Data Source set to raw. Why is this happening? I need those same graph for the input variables.

DougWielenga · Posted 10-20-2017 05:03 PM

Mike90,

The thing to remember is that SAS Enterprise Miner is designed for data mining (potentially massive) data sets so things that might be done for small data sets do not scale. There are often hundreds if not thousands of variables in these data sets so looking at the correlation across all inputs is just not feasible. There is the Stat Explore node to look at individual variables as well. Since the data might be coming from distributed systems, it is not possible to bring all the data locally. Data Mining is about finding patterns in large complex data sets so graphing individual points and trying to assess relationships one at a time is not necessarily going to assist that goal. In several places, SAS Enterprise Miner brings a small portion of the data locally (configured differently depending on the place the data is to be viewed/used) to allow some discovery but this is not the primary approach to analyzing large complex data sets. The Project options you mentioned control how many observations are brought back from the observations that are visible when you view the Imported or Exported data sets.

SAS Enterprise Miner limits how much data is brought back though due to the impact on performance. You specify the maximum number of observations but it is controlled by the actual size of the sample on the disk. If you have many extremely wide fields, you will see fewer observations than if you have only numeric data, for instance. In the end, the compute tier will analyze all the data and the portion brought back for viewing is just for information and not intended to be the primary means of discovery. I might be able to suggest some strategies if you could let me know what you were hoping to learn from the sample.

Hope this helps!

Doug

Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

Re: Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

Re: Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

Re: Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

Re: Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

Re: Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

Re: Getting Graph Explore to use more than 20,000 observations (EM_EXPLOREOBS_MAX = 80,000)

The 2025 SAS Hackathon has begun!