09-19-2016 07:47 AM
I am reveiwing SAS EMiner model developed by an anlaytst at clinet bank. He has used SAS Code node to create some groups of numerical variables and categrical variables. I was scoring a new ABT using score node. I also tried to score the same ABT in EG using the score from Score node of Eminer but to my surprise I do not see steps done in SAS code node in Scoring code generated. Am I doing something wrong? I believe that all steps performed in the EM should be in the Scoring code if the variable is final variable of the model.
I am creating segmentation for 10 million customers and I get 90% of customer into only one segment. I think all possible variables have been take in account. What should be my next step?
Thanks in advance
09-19-2016 02:14 PM
To have code from the SAS Code node included in the score code for the flow, make sure the Score Code pane of the SAS Code node is used. On that pane, you just enter the DATA step code directly, see the attachment.
09-19-2016 10:13 PM
Thanks for quick reply.
There is data step code in train part of code pannel which is like
some if else statements;
SO, my next question is which part I should copy in Score code panel and what about validation
My second question is
1. Client analyst has done imputation before data partitioning. I would not suggest him. But In score code I also do not get anything related to imputation? What could be the reason?
thanks in advance
09-20-2016 09:20 AM
So everything after the SET statement and before the RUN statement would go in the Score panel. Then that code will automatically be applied to any data partitions you have (training, validation, test).
If you are using the Impute node and there are in fact variables with missing values with missing % below the Missing Cutoff property (or even if there aren't any with missing values and the Nonmissing Variables property is set to Yes), then that score code should be included. Would need more details to figure out why it's not there.
09-20-2016 11:15 PM
If the client did the imputation before partition using EM Impute node, the imputation data steps should have been collected into the eventual scoreing piece using Score Node or Score Export Node. Automatically. I attach a PDF of 2 pages I took from EM user guide that shows which nodes generate steps that are automatically appended together by EM Score nodes automatically.
By design, nothing written under the EM SAS Code node is automatically picked into the final scoring equation by Score Node or Score Export Node. Subtle is this: if and when set up properly (selecting the right tool type +options) the SAS Code Node runs as expected and produces as expected. This, however, does not give the license for the 'correct' stuff to be automatically incorporated into the final scoring equation. The reason behind this essentially is the price we pay for the great fliexibility affordedy the SAS Code Node. Almost anything and everything you have licensed, or anything you can stick into BASE Editor and make it run, can be inserted into the SAS Code node and run successfully. If EM SAS Code node is designed to automatically write everything we put under Score Code Node, EM could very well end up like EG. And perform much worse than EG or any other SAS code writer.
In the past two years I have encountered two dozens SAS customers who have tried to insert thousands of data preparation codes into SAS Code Node and force EM to compile them into its final score code. The distinction is this: EM is a predictive modeling tool while one should use BASE, or Java, or Hive or C or whatever to prepare the input model universe as much as possible before inputing the data set into EM for modeling. The SAS Code Node is intended for 'mid-flow' or 'mid-stream' supplements that require facilities that are from beyond EM's built in scope. As a practical matter, it is much neater and much easier to inspect if you just lay bare thousands of your steps and procedure code in BASE or SAS Studio, instead of clicking through deep into the thick of EM Code.
For the purpose of completing your scoring piece, you don't need to copy over everything done related to validation, if any, in the Code node. Validation requires a present target variable which does not typically provide itself in your scoring process.
As for your lop-sided segment% of 90%, I think you need to trace back down to make sure all the custom code the client analyst had injected into this EM process has been adequately recovered and brought back to your scoring job on hand, before you 'scream' again. The chance is: as you mentioned, the client analyst's pre-partition imputation logic is missing, then the 90% smells just like a category collapse due to the imputation piece being missing. I concur with you that if the client analyst did the imputation prior to the partitioning process in EM, the analyst may very well have done that on the data set before plugging the data set into EM to beging with.
Hope this help? Thank you for using SAS.
09-21-2016 09:19 AM
Just a slight correction to this:
By design, nothing written under the EM SAS Code node is automatically picked into the final scoring equation by Score Node or Score Export Node.
Again, if the code is put in the Score panel of the SAS Code node, then it is included in the accumulated score code in the Score node. I just tested it with a flow: HMEQ->SAS Code node->Decision Tree->Score
where the SAS Code node had this line of code in the Score panel:
newvar = clage + 1;
That line shows up in the SAS Code window of the Score node before the Decision Tree score code:
* TOOL: SASHELP.EMCORE.EMCODETOOL.CLASS;
* TYPE: UTILITY;
* NODE: EMCODE;
newvar = clage + 1;
09-22-2016 08:10 AM
Thanks for the detailed reply. Client has used SAS Code node in Eminer because they are using CAB solution for the ABT development. The CAB has limited ability for different types of variable creation like you cant create flag variable unless you have in infomap and also categirical variables can not be regrouped. I suggested them to use replacement node for categorical variable and ineractive binning for numeric variables if they want to bin. Yes, you are right Eminer is for model development not for coding but client has CAB and Eminer as solution. They do not want to keep another piece of code in EG or base sas program seperately. Thanks for your help.
I have one more follow up question regarding SAS Code node and Score code. Client has used Linkart method to convert the categorical values to numeric values using the SAS Code node. I do not think any other node can do this as far as I know. They have used a macro for this purpose. I think SAS has shared this macro. 3 Variables which have been converted into numeric values using this method are in final model. But in scoring code or optmized SAS code I do not see any code which does this conversion. But data step for this conversion is present in SAS Code (unoptimzed one). But when I score a new data I am getting correct values. Is the code for linkart conversion is implicit in Score node. I tested with optmized score code in Base SAS program. I get missing warning for linkart converted variables which is ok because its not in the data. I copied required code from unoptimized SAS code then I am getting same scorig output which I get in the Eminer score node.
I am I not follwoing it correctly, please help me. I have to suggest the client if they use Linkart method then how should they score and how to ensure that there is no error in scoring.
thanks in advance