BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pautere
Fluorite | Level 6

Hello,

I'm running a very simple filtering operation with SAS EM filter node, the filter only filters dataset based on whether a variable exists or not. All other filter options are set to none, so the filter node should only do very simple operation. For some reason the filter node takes around 20 minutes to perform this operation that in normal SAS environment would only take fraction of seconds. The data set I'm using is quite large (~3 million rows, 100 columns), but still the operation shouldn't be very difficult to perform... Any hints how to make this faster or do the filtering in a faster way?

1 ACCEPTED SOLUTION

Accepted Solutions
pautere
Fluorite | Level 6

Ended up implementing filtering in the sas code- node as the filter node does not seem to work. Quite straighforward there:

 

data &EM_EXPORT_TRAIN;
 set &EM_IMPORT_DATA;
 where conditions;
run;

View solution in original post

4 REPLIES 4
Reeza
Super User

The filter operation would then actually be copying that big data set over to a new, temporary dataset, without the variable. This may also be happening over a network, slowing things down.  3 million rows shouldn't be that big of a data set though, so you may want to talk to your IT folks about tweaks to your system. 

 

pautere
Fluorite | Level 6

Thanks for the answer. Any idea what is so special about filter node that makes it so slow? In the same diagram I'm also loading data, doing data partitioning, building 2 regression models with forward variable selection and completing model comparison. All other steps take around 2 minutes to compute all together but the filtering node which does much simpler things takes 10 times more time. What makes the filtering node so special compared to other calculation nodes that it takes so much time? Loading data using the input data node only takes maximum 30 seconds so I don't think copying the data can take 20 minutes..

pautere
Fluorite | Level 6

Is there some other way to do filtering in EM? The filtering I'm trying to do is super simple and as data step it would be the following:

 

data OUTPUT_DATA;
 set INPUT_DATA;
 where not missing(VARIABLE);
run;

 

How to implement this in EM? Using filter-node this takes around 17 minutes to compute which is really not usable.

pautere
Fluorite | Level 6

Ended up implementing filtering in the sas code- node as the filter node does not seem to work. Quite straighforward there:

 

data &EM_EXPORT_TRAIN;
 set &EM_IMPORT_DATA;
 where conditions;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2457 views
  • 0 likes
  • 2 in conversation