About JasonXin

JasonXin · ‎07-09-2014

Hi, When you say "Based on the RPM workflow diagram", you mean you ran RPM, then imported its model XML back into 'regular' EM, and you see one interactive binning node is engaged in the RPM workflow, right? Thanks.

JasonXin · ‎07-09-2014

Highlight the Score node where you presumably have used to have score, at the panel to the left that displays all the configuration exercises you can have with the node, you should see a subpanel that shows input and output data sets where you can see the scored data set there already. It should sit at the EM workspace directory the project already. You can source it back to EM's data set list so you can draw into your new EM exercise. ideally in doing so, you may prefer to carry over all other candidate variables. Hope this helps. Jason Xin

JasonXin · ‎07-09-2014

Fred, You can test your model on a small random sample of the customers who only use online banking solution to see if the model is applicable to that cohort. As a practical matter, models that are deployed for the 'right' segments can go wrong for all sorts of reason. A small scale, pre-test like this should take out potential complications embedded in full scale, 'live' deployment. Models are supposed to be deployed for the intended segment. This is essential for model performance measurement. This does not mean the model has little or no applicability onto segments that do not overlap much with the original model universe. If a model is driven by variables that are shared between online banking and non-online banking customer bases, the model may very well stand up OK. What you can test is in building the model, create a flag or a set of variables to 'single out' those on banking customers only to see if such flag is predictively significant or will bias your model in any way. In some cases, such flag is significantly statistically but does not call for separate model for each. Sometime the flag may behave like a 'fault' that separates continents, when a separate model may be good idea. At variable selection phase, you may pay attention to variables that are unique to one segment but not to the rest. Best Regards Jason Xin

JasonXin · ‎07-08-2014

PROC GLM has option of REF=FIRST | LAST which you can set as global option for the whole class list. Jason Xin

JasonXin · ‎07-08-2014

You can use Metadata node to drop variables in the middle of EM workflow. You are right that Filter node is to 'cut values' of a variable. Metadata node, as the name suggests, is about managing data sets. Jason Xin

JasonXin · ‎11-29-2013

Ajosh, Let us focus on your original question regarding boosting, specifically, this portion " I later used the output dataset from the end group nodes by merging the training and validation dataset which has the entire dataset. However, I observed that for every node number, the predicted probability of target = Y is not the same throughout the records which have the same node number. Also the range of these predicted probabilities overlap across node numbers. Also, the output from the end groups result window shows around 60% true positive rate and 70% true negative rate, which means some good amount of classification is happening due to boosting approach. My end objective is to derive patterns/if then rules from such a dataset. Is anyone aware of how can this be accomplished (is there any other node that needs to be used on the exported dataset of end group node and so on)?? " 1. For the first portion, You merged the training and validation data sets (You select the end group node, and went to Exported Data button to find your data sets underneath, right?). The model trained and the model validated physically are two different ones, although logically the same one, since one is the validated, balanced version of the other. Wonder if you can just look at either one of them at a time. Whether it is to report model performance, or extract score code, analytically you should stick with what comes off the validation. It is indeed a good practice to try to minimize difference between training and validation data sets. In other words, if the gap is big and varies from attempt to attempt, you may consider training 'better' to close the gap. Also keep on to see if performance off the validation data set is improving; or at least is stable. 2. As for the rule, please disregard my previous remark surrounding. That was largely correct but I thought you were doing SGB. I checked and don't see any major difference on this subject between EM7.1 and EM12.3 (the two user guides appear largely the same on the group processing). So I am going to use EM12.3 to speak about EM 7.1 on this subject. For the End Group processing, if you go check for Flow Code and Score Code, it has anything but Flow Code and Score Code for group processing, unlike SGB. I can expand quite a bit on this. I want to keep it on focus on what you want. If you can clarify a bit about why you need "derive patterns/if then rules from such a dataset." For example, are you trying to port it to elsewhere to score, or just to study further on the mechanics of the boosting process? I agree with Reeza for your follow-up questions on cut-off... you may get better response performance if you can post them as separate questions. Best Regards Jason Xin

JasonXin · ‎11-29-2013

Ajosh, Let me find a virtual machine that runs EM7.1. There are quite bit changes between EM7.1 and 12.3. Will get back to you later. Best Jason Xin

JasonXin · ‎11-27-2013

Hi, Ajosh, Not sure which version of EM you have. Perhaps the solution is 'versionless'? On my version 12.3 EM, right-mouse click on the Gradient Boosting Node (suppose you have finished a successful run). In the drop-down, select Results entry. In the upcoming windows, up left corner there are 4? menus. One should be View. Select View --> SAS Results --> Flow Code. For Boosting you should see a long What-if. Take a look if this is what you are looking for. A much 'cleaner' score piece can be found, after the model is finalized, at View --> Scoring --> SAS Code. Best Regards Jason Xin (from SAS)

JasonXin · ‎10-30-2013

Hi, Kanyange, Here is a link that has video that addresses oversampling with EM 34270 - Oversampling techniques in SAS® Enterprise Miner(tm) Another SAS note provides step-by-step instruction 24205 - Rare event oversampling for model fitting in SAS® Enterprise Miner(tm) Enjoy. Post back if you have questions. Best Jason Xin

JasonXin · ‎10-30-2013

Consider certifications. BASE certification is always good idea although it is not directly data mining related. There is modeling certification from SAS as well. For folks with little data mining job experience, while certification never guarantees success, it adds much credibility to hiring managers. Certification should serve as a starting point, though. Once you get a job, you just practice and progress. There isn't any shortcut or secret. Whether you go for certification or not, reading real life experience is always good idea. My favorite site is www.lexjanse.com where all the published SAS conference proceedings are there. You can pick out data mining as your subject. Enjoy Jason Xin

JasonXin · ‎10-30-2013

Hello, Chemicalab, Here is link to SAS STAT user guide that has some details on how to interpret proc logistic. For more depth, check books written by Paul Allison. If you visit www.lexjansen.com, you can find many past proceedings covering the subject. http://support.sas.com/documentation/onlinedoc/stat/index.html Best Regards Jason Xin

JasonXin · ‎10-30-2013

Sasman1441, I tested your data with EM 12.3. The partition runs fine. " User: sasdemo Date: October 30, 2013 Time: 12:17:41 *------------------------------------------------------------* * Training Output *------------------------------------------------------------* " Variable Summary Measurement Frequency Role Level Count INPUT INTERVAL 45 INPUT NOMINAL 4 REJECTED NOMINAL 1 Partition Summary Number of Type Data Set Observations DATA EMWS1.Ids_DATA 51118 TRAIN EMWS1.Part_TRAIN 20447 VALIDATE EMWS1.Part_VALIDATE 15335 TEST EMWS1.Part_TEST 15336 " I did not find any message from SAS support KB or google.com. I suspect you may be using one old EM version 5.2 or...? Let me know. Jason Xin

JasonXin · ‎10-05-2013

In current SAS High Performance (HP), one common foundation HP PROC is HPDMDB where you can code like this proc hpdmdb data=&outdsn. classout=outdsn varout=v maxlevel=15000000; /*if you set a small #, it wraps the rest into OTHERS*/ class _all_ ; /*list all the variables in the data set including interval, numeric as well as categorical*/ run ; The result will show % of each unique value and their levels. For very continuous variables, it does indeed appear very miscellaneous. If you have Enterprise Miner license, you should be able to use proc DMDB. The difference is when the data set gets big, or when the table does not have too many observations, but you have many columns to count, then it may take some time. Proc HPDMDB runs much faster. It leverages multi-threading capabilities on your computers. If you are set up to run on parallel nodes with in-memory, it will be much, much faster.

JasonXin · ‎10-01-2013

Hi, Tijl, This TS note appears to address your issue with EM 5*x. There is a hot fix there. The root cause is the Merge Code. Problem Note 19447: Incorrect Score Code generated from Merge node Jason Xin

JasonXin · ‎10-01-2013

Graham, Yes you can definitely code the same payment values into differing binning criteria/schema/rules/cuts/wishes. This is often seen with modelers using BASE. In case of using EM, it is not unusual to see a modeler add SAS Code Node to run his or her custom coding on the same variables, alongside whatever EM is doing with the variables, to compare and test. The logic behind this is: while rule of thumbs or general guidelines often apply, the 'best' cuts/bins often are determined by try and error. After coding the same payment variables into differing bucket variables, you should, though, expect that they are highly correlated. Depending on specifics, sometimes you select one over the others. Sometimes you build them into PCA or factors. The reality is when the data, in your case the payment data, are typically NOT collected with any analytics in mind. The data just ENTERED into your database. You have to configure it to situate your models. The payment variable is like you foot. Of course you should try different pairs of shoes to decide which one fits the best. Jason Xin

Online Status	Offline
Date Last Visited	‎01-19-2017 03:46 PM

Re: How many leaves and nodes should a tree

Re: How many leaves and nodes should a tree

Re: SAS EMiner Oversampling reduced the traget sample size

Re: Enterprise miner Node Leaf size issues

Re: Enterprise miner Node Leaf size issues

Re: SAS Enterprise Miner GBM Node

Re: SAS Enterprise Miner GBM Node

Re: Missing/Not Applicable Values for Interval Variable

Re: SAS EMiner Oversampling reduced the traget sample size

Re: Missing/Not Applicable Values for Interval Variable

Tip: Defining Global Metadata for SAS® Enterprise Miner™ Projects

Re: Tip: Bayesian networks implemented in the HPBNET proc in SAS® Ente...

Credit Scoring by Example in SAS® Enterprise Miner™

Re: Tip: How to interpret your SAS® Rapid Predictive Modeler results

Re: proc glm class variables descending

Re: How many leaves and nodes should a tree

Re: Enterpise Miner

Re: Enterpise Miner

Re: Imputing vs Rejecting

Re: Tip: How to interpret your SAS® Rapid Predictive Modeler results

Re: Data Partition Node SAS EM

Re: Tip: How to interpret your SAS® Rapid Predictive Modeler results

Re: SAS EM use Scored data in another model

Re: Is this bias and will validation return useless results?

Re: proc glm class variables descending

Re: Classification: K nearest neighbors (MBR)

Re: Deriving patterns (if then rules) from boosted trees in SAS Enterp...

Re: Deriving patterns (if then rules) from boosted trees in SAS Enterp...

Re: Deriving patterns (if then rules) from boosted trees in SAS Enterp...

Re: Oversampling in Enterprise Miner, Please Help...Thanks

Re: I'm intersted in Data Mining

Re: Proc Logistic Estimates

Re: Data Partition Node SAS EM

Re: Count different columns

Re: Combining several variable selection techniques in Enterprise Mine...

Re: Predictive Modelling using SAS EM