About M_Maldonado

M_Maldonado · ‎09-09-2015

You can do an incremental response model (a.k.a. net lift model) for the data with campains when you only sent the offer to certain customers. A paper that uses direct marketing as an example: Net Lift Model for Effective Direct Marketing Campaigns at 1800flowers.com A paper that goes a little deeper into theroy and an example: Incremental Response Modeling Using SAS® Enterprise Miner™ I hope this helps! -Miguel

M_Maldonado · ‎09-09-2015

If you are using SAS Enterprise Miner, you can use the Association node to calculate the confidence and support of rules for your items, and to filter them out if they are below certain values. In this discussion (link here) you can find a good overview of how this node generate rules using proc assoc and proc rulegen behind the scenes. There are usually two steps in "pruning" for the apriori algorithm. First pruning step: you will not consider rules that do not have a minimum frequency in your training set; second: you will reject rules below a minimum support. The word pruning is confusing in this context because it makes you think about decision trees. It is more a filtering than a pruning if you ask me, but it seems the term is here to stay. I hope this helps! -Miguel

M_Maldonado · ‎09-09-2015

Hi Zach, HPForest is not just multiple decision tree runs. HPForest is a very specific type of decision tree ensemble. For each decision tree that you train you are not using every observation, and not all variables are candidates for splits. It does not sound intuitive at first, but Breiman and other authors have demonstrated that this approach works best for a robust model. Once you decide to use a model with low interpretability like a gradient boosting, a random forest, an SVM, or a neural network, you have traded off interpretability for better prediction. One useful trick to better understand the variables driving your model for a binary target: 1. Add a Model Comparison node, a Score node, and a Reporter node after your model. 2. For the reporter node set the Nodes option to SUMMARY. Run this flow and open the results. 3. Notice that the pdf report ran the Rapid Predictive Modeler reports for your model. This report includes the Selected Variable Importance chart based on a decision tree of your predicted event. You can use this chart to explain the main drivers of your model. I find it easier to use this report even for a model like HPForest that already outputs variable importance. I think this chart is easier to explain than the out-of-bag error reduction, and the results usually match. Before trying to make a recommendation for WWSCMD, please share some info and charts: -proportion of events to non-events of your target variable? is it a rare event? -iteration plot for your HPForest -plots from your Cutoff node results including ROC, positive rates, and precision recall cutoff I hope this helps! Thanks, -Miguel

M_Maldonado · ‎08-27-2015

Hey Ujjawal, To answer your specific question, at each step you are using a random sample without replacement but that does not mean that the number of available observations decreases. In your example of 20 000 observations and 0.5 training fraction, at each step you re-weight the 20 000 observations, but only use 10 000 to train your tree at each step. This number is constant through all the boosting exercise. You sample 10 000 observations without replacement every time, but you still have 20 000 to sample from at each step. Further discussion about training sample You have the right idea. At each step of boosting, a new tree is trained with a different sample of re-weighted observations (whose weights are based on residuals). Jerome Friedman tried different values of training samples, no sampling included. He found experimentally that "stochastic" boosting was more accurate than just boosting. Instead of using the entire re-weighed data set to train a tree (weak classifier), he tried several training fractions and found out that both small and large data sets had an improvement in error for training fractions on the range [0.4, 0.8]. Take a look at J. Friedman's paper Stochastic Gradient Boosting. He goes into more detail about a stochastic gradient boosting used for regression. Look at figures 1 and 2 and their discussion. Also try Gradient Boosting models in SAS Enterprise Miner with different values of training sample. You might find a case where training sample of 100 gives you a better model. But those cases are rare! The default training sample of 0.6 usually works great for the default maximum depth of 2. If I had to rewrite this paper Leveraging Ensemble Models in SAS® Enterprise Miner™ , I would definitely beef up the discussion between boosting and stochastic gradient boosting. I hope this helps! -Miguel

M_Maldonado · ‎08-20-2015

probably not. create a dummy variable in table 1 to make it character before merging it to table 2. good luck!

M_Maldonado · ‎08-18-2015

hey mlogan, You can give it a try at writing a SAS macro that uses different formats/informats. While you do that, a quick-and-dirty trick would be to fix that date column in Excel, and then import it to SAS. To make it the same in Excel, I would do as this steps (Format a date the way you want) and maybe copy the column and paste as values. Good luck! -Miguel

M_Maldonado · ‎08-17-2015

The macro would be a really elegant solution. Maybe creating a model package would be one way to go. If you are only going to do this weekly, one approach would be to save the xml of your diagram (right click and select Save diagram as) and import it later (import from XML). Modifying the Import File and the Save Data nodes seems almost as easy as having a macro that needs file name and results as macro parameters. What do you think? Thanks, Miguel

M_Maldonado · ‎08-16-2015

fri0, One more thing to do is to connect a Segment Profiler node after your cluster node. Use the plots in the results to visually confirm that each cluster has a different distribution for certain variables. Press F1 to open the SAS Enterprise Miner Reference Help and take a look at the section for the segment profiler node. I hope it helps, Miguel

M_Maldonado · ‎08-07-2015

Weird... To expedite things, let's approach this from 2 ends. We can keep bouncing ideas on how to troubleshoot this, while you enter a Tech support form using this link: Technical Support Form. Also please confirm -how did you manually reject the variables? from the interactive grouping options (change from default to rejected and then click Apply)? -on the results of the interactive grouping node, what are the roles in the Output Variables table? The column "calculated role" is overridden by the "new role" column. For the example below, since I manually rejected "checking" the new role is rejected and it will not appear in the Scorecard. Can you do the same? Thanks,

M_Maldonado · ‎08-07-2015

High Performance Data Mining is the optimized way of analyzing 22.7 million observations. Enterprise Miner 12.3, 13.1, 13.2, and 14.1 have specific nodes that run hpprocs. For example HPCluster node uses PROC HPCLUS, which can also take advantage of a grid, distributed environment. Touch base with Tech Support to confirm that your system is well suited to handle your large data sets. In the meantime, you can see a static summary of your tree if you connect a Reporter node to your flow. By default it generates a PDF with the relevant results from your diagram flow. I hope this helps!

M_Maldonado · ‎08-07-2015

wait up... maybe I didn't read this well the first time. what you are doing is: 1. you click on the Interactive Grouping ellipsis. 2. You go to the Groupings tab and you change Variable Role from Default to Rejected. 3. You click apply Is it this variable role that is not being honored? Where do you see that this role changed, in the scorecard?

M_Maldonado · ‎08-07-2015

Rogelio, When you change properties in a node, it re-runs stuff. In your Interactive Grouping node set Use frozen groupings to Yes. This will prevent the Interactive Grouping node from re-running the groupings and overwrite what you had created. Good luck! -M

M_Maldonado · ‎07-30-2015

Hey Sid, Since you are going from 15 to 12 months you might not need to calibrate your model as long as 12 months holds as a good prediction window. Are you familiar with vintage analysis? If not, Naeem Siddiqi explains it in two pages in his book Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. It is a must-read for Credit Scoring. It is very concise and easy to follow. The idea is pretty simple. Based on the cummulative bad rate, you can confirm if it makes to reduce the prediction window from 15 to 12 months. Another alternative, run a survival analysis using as input the same groupings from your scorecard, and determine from the hazard function if 15 months or 12 months are a better or similar prediction window. Good luck! -Miguel

M_Maldonado · ‎07-30-2015

If you have SAS IML, there is one really fast way to count missing/not missing. Look at this post: http://blogs.sas.com/content/iml/2011/09/19/count-the-number-of-missing-values-for-each-variable.html Do you have HP procs and a grid license? That is another route. I hope it helps, Miguel

M_Maldonado · ‎07-29-2015

Look at this thread, it shows you where to click to see the exported data sets. Or you can also use the Save Data node. https://communities.sas.com/message/247767 Good luck!

Online Status	Offline
Date Last Visited	‎02-28-2018 11:39 AM

Re: Unbalanced data - miner

Re: SAS EM only: How to use parameter estimates in the next node?

Re: StatExplore Node

Re: How many leaves and nodes should a tree

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: run time error ensemble model

Re: run time error ensemble model

Re: help with hash table

Re: help with hash table

Re: StatExplore Node

Re: How to access Variable importance in neural network in EM?

Re: Grouping variables to create new variables SAS Enterprise Miner

Re: Error when running market basket node in SAS EM

Re: Seed Initialization Method for Hierarchical Clustering

Re: Using cross-validation in Enterprise Miner;

Re: How come no Segment Profile after I set "Cluster Variable Role" = ...

Re: Confusion matrix in Enterprise Miner

Re: How can we export dataset from enterprise Miner as a csv file or t...

Re: How many leaves and nodes should a tree

Credit Scoring by Example in SAS® Enterprise Miner™

Tip: How to model a rare target using an oversample approach in SAS® ...

Tip: How to interpret your SAS® Rapid Predictive Modeler results

Tip: Use the Cutoff Node in SAS® Enterprise Miner™ to Consume the Post...

Tip: How to build a scorecard using Credit Scoring for SAS® Enterprise...

Re: Predictive modelling on non-random data

Re: The Apriori Algorithm - Pruning

Re: Enterprise Miner Cutoff Node & Intepretation

Re: Sampling : Gradient Boosting Tree

Re: Variable in both Numeric and Character

Re: Fixing Date with the first day of the month or year

Re: Developing a Looping / Macro process in SAS Enterprise Miner

Re: How could I validate clusters from k-means

Re: interactive grouping changes user defined roles

Re: Size of data causing problems

Re: interactive grouping changes user defined roles

Re: interactive grouping changes user defined roles

Re: Calibrating 15 month Probability of Default (PD) to 12 month PD

Re: Count the number of non-zero occurrences for a variable

Re: Help : Modeling with Enterprise Miner