turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- SAS Enterprise miner filter node extemely slow

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-08-2016 10:45 AM

Hello,

I'm running a very simple filtering operation with SAS EM filter node, the filter only filters dataset based on whether a variable exists or not. All other filter options are set to none, so the filter node should only do very simple operation. For some reason the filter node takes around 20 minutes to perform this operation that in normal SAS environment would only take fraction of seconds. The data set I'm using is quite large (~3 million rows, 100 columns), but still the operation shouldn't be very difficult to perform... Any hints how to make this faster or do the filtering in a faster way?

Accepted Solutions

Solution

01-14-2016
05:28 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pautere

01-14-2016 05:28 AM

Ended up implementing filtering in the sas code- node as the filter node does not seem to work. Quite straighforward there:

data &EM_EXPORT_TRAIN;

set &EM_IMPORT_DATA;

where conditions;

run;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pautere

01-08-2016 10:51 AM

The filter operation would then actually be copying that big data set over to a new, temporary dataset, without the variable. This may also be happening over a network, slowing things down. 3 million rows shouldn't be that big of a data set though, so you may want to talk to your IT folks about tweaks to your system.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

01-08-2016 11:33 AM

Thanks for the answer. Any idea what is so special about filter node that makes it so slow? In the same diagram I'm also loading data, doing data partitioning, building 2 regression models with forward variable selection and completing model comparison. All other steps take around 2 minutes to compute all together but the filtering node which does much simpler things takes 10 times more time. What makes the filtering node so special compared to other calculation nodes that it takes so much time? Loading data using the input data node only takes maximum 30 seconds so I don't think copying the data can take 20 minutes..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pautere

01-11-2016 08:56 AM

Is there some other way to do filtering in EM? The filtering I'm trying to do is super simple and as data step it would be the following:

data OUTPUT_DATA;

set INPUT_DATA;

where not missing(VARIABLE);

run;

How to implement this in EM? Using filter-node this takes around 17 minutes to compute which is really not usable.

Solution

01-14-2016
05:28 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pautere

01-14-2016 05:28 AM

Ended up implementing filtering in the sas code- node as the filter node does not seem to work. Quite straighforward there:

data &EM_EXPORT_TRAIN;

set &EM_IMPORT_DATA;

where conditions;

run;