03-21-2014 10:09 AM
I have dataset with 3000 customers and each customer has 120 observations. I was trying to build predictive model for each customer.
There are several SAS procedures such as proc GLMSELECT or proc GLM that support by group processing. I was wondering if SAS enterprise miner has the option of "by" group processing ?
If so could you please post an example or an illustraction on how this could be done. I checked the SAS EM user notes, and was not sucessful.
03-21-2014 11:59 AM
The product is assuming it should analyze the data finding groups (binning) .
What is you goal with group processing? Do you know the wanted result and need the model that?
03-21-2014 01:31 PM
The only node that I can think of that does something similar is the Survival Node.
If you specify the option Data Format as Fully Expanded, you will have the analysis done per ID variable as in your example (multiple rows for each customer ID as long as you define customer with a Role of ID).
If you are analyzing (or forecasting) a time series, you might have the data already in the right format to use the Time Series nodes in Enterprise Miner 13.1.
In general, the rest of the nodes expect a summary of all inputs and a target for each customer.
Just recently I was talking with a customer about an idea of a Feature Engineering node in which it would summarize variables per ID. All inputs would be summarized as frequencies, counts, and sums to assist you in the task of collapsing a dataset like yours (120X3000 rows) into a summary of 120 rows per customer ID. Would this be useful to you for future releases?
All feedback is truly appreciated!
Enterprise Miner R&D
03-22-2014 09:53 PM
Miguel, Thank you very much for your response. I was trying to build a neural networks regression for a time series data by each customer. SAS stat SAS ets have excellent facility such as proc glmselect, gam, adaptive reg etc.,
proc glmselect data = input;
model sales = x1 - a10/selection = none;
score data = output out = pred;
I wish we have similar features in SAS EM, for example proc neural or SAS EM.
A future release incorporating this would be extremely helpful for time series data ming and forecasting problems
You could also read this blog that echos my appreciation on by group processing in SAS procedures. It owuld be great to have this SAS EM too!! Learning R has really made me appreciate SAS | randyzwitch.com
03-23-2014 05:51 AM
Forecaster, I do not understand why you are refering to SAS/Stat and doing old style coding by using a proc statement.
The way Eminer is working is a graphical approach using nodes (a node can use procs).
The latest version (13.1) of Eminer also support time-series (new nodes) when digging ETS can be found below.
As every node can be a model-task a new node is: Open Source Integration Node
The Open Source Integration node enables you to write code in the R language inside of SAS Enterprise Miner. The Open Source Integration node makes SAS Enterprise Miner data and metadata available to your R code and returns R results to SAS Enterprise Miner.
In addition to training and scoring supervised and unsupervised R models, the Open Source Integration node allows for data transformation and data exploration.
As the modelling proces itself is automated by Eminer using "model nodes" I think you need to see Eminer different as you have done.
It is thinking one level above the programming in R approach as it is on level above programming in SAS.
When you have different customers each delivering their own data it could be that the resulting Eminer model of every customers is different. By that you can not use the old by approach of proc statements. It is the miner project itself that can need some adjustions somewhere.
Saying that it is rather easy to duplicate Miner project and make changes afterwards
03-26-2014 03:12 PM
Hi. Please take a look at the group processing nodes in Enterprise Miner. Specifically "Start Groups" and "End Groups".
"...the group processing facility can be used to:
analyze more than one target variable in the same process flow
define group variables such as GENDER or JOB, in order to obtain separate analyses for
each level of the group variable or variables.
use cross validation techniques to test the stability of predictive models
specify index looping, or how many times the flow following the node should loop
resample the data set to create bagging and boosting models"
Hope this helps.