About M_Maldonado

M_Maldonado · ‎09-24-2014

Hi Jon, Maybe I am overthinking what you mean by "rules for the final model by majority vote". Where did you get that idea or definition? As far as I understand there is no voting of the rules on a bagging model, but on the predicted probabilities of each of those models. In your specific example, every observation in your data set will be scored with each of one of the trees in your bagged model. The predicted probabilities of these models are then averaged. Below a diagram to illustrate what happens behind the scenes (I grabbed it from the paper I mentioned). The actual code in the Start Groups and End Groups nodes is quite more efficient. Using this alternative diagram you could switch the option in the Ensemble node from averaging to vote. But again, you are voting on the predicted probabilities of each of the models for an observation, not voting on the rules of the models. Whether you combine the predicted probabilities of the models by averaging or by voting, it is the predicted (posterior) probabilities, not the rules. When you have data partition, your models are assessed using a statistic of your validation set. For the specific case of trees, the pruning of the tree model will be based on the validation set. Are you sure it is not the rules of each model that you want to see? For example, in the figure I pasted you have 4 Decision Trees. You could learn more details of your model by looking at the rules of each tree if you think that adds value, but you cannot combine them into "rules of the final model". Does this help at all? Miguel

M_Maldonado · ‎09-23-2014

Jon, You don't need to reverse engineer. While the most common use of the Score node is to score new data, in this example you are using it just to get the score code wrapped up together in a single place. Just add a Score code to your flow as below, and you will see in the results the rules of each of your trees in very clear sections as I mentioned. Is this a good solution for what you need? Thanks, Miguel

M_Maldonado · ‎09-23-2014

Hi Jon, Smart choice! Ensemble models like bagging and boosting are a great way to start teaching yourself data mining because these models handle well missing values, are more robust than a single decision tree, and do not require preprocessing your variables. Good news: Add a Score Node after your End Groups node. Run it and go to the results to the optimized sas code output. You will find the scoring code for each decision tree in your bagging model in sections with a header "Decision Tree Scoring Code" and a footnote "End of Decision Tree Scoring Code". You will be able to read the rules from there. Bad news: These rules may not be in the pretty format that a single decision tree node would give you. You cannot browse through the results and go to Model->Rules because these files got overwritten to make things more efficient. Need more? Enterprise Miner is really flexible. Give it a try with reading your rules from the score code, and please share how useful/painful you find it. In the meantime I'll think about a workaround for you to make Enterprise Miner save these rules before they get overwritten and how to surface them in a useful way. Depending on your findings we decide how to work this around. More resources: What Link Good paper for Ensemble Models Leveraging Ensemble Models in SAS Enterprise Miner Fantastic book for decision trees, it also introduces ensembles Decision Trees for Analytics Using SAS® Enterprise Miner™ Big data book that is an essential read for data miners, explained in simple terms that make it really useful at introductory level Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners Course that took me from zero to hero in data mining Applied Analytics Using SAS Enterprise Miner Good luck! Keep me posted on your findings, Miguel

M_Maldonado · ‎09-17-2014

Hi, In a bunch of places Here 3 that I can think top of my mind. 1. Results Output On the results of a model node, go to the output and punch the keys Ctrl+F to find "Event Classification Table". You will see one like the below: Event Classification Table Data Role=TRAIN Target=BAD False True False True Negative Negative Positive Positive 364 4518 253 825 2. Classification chart If you are a visual person, see this in a chart! After you run a model node, go to results, then browse to View->Assessment->Classification chart. This gives you a stacked bar chart of correct vs incorrect predicted target. To see it in a table format, click on the table icon (2nd icon right to left). You will have all the info you need (target, outcome, target percentage, total percentage, etc). 3. Model comparison node When your diagram has a categorical target variable, the model comparison will use misclassification as the default selection statistic. You will find the Event Classificaiton Table of all models connected to that model comparison node. Was this a good answer? Good luck! -Miguel

M_Maldonado · ‎09-15-2014

Hi Pritish, Take a look at the Replacement node in SAS Enterprise miner. It does not remove outliers the way the Filter node would. By default the limit method is based on a standard deviation from the mean. Take a look at the flooring and capping that happens by default. There is also a brief example of how to use the replacement editor in the reference help (press F1 and go to the Replacement node under the Modify sections). Good luck! Miguel

M_Maldonado · ‎09-09-2014

Hi, 2. I don't know a lot about initialization, other than it is a crucial first step to find a good number of clusters. The initialization options in the cluster node (first, macqueen, principal components, and full/partial replacement) seem pretty data-driven to me. Is any of those similar to the initial points you would manually pass? How much value does it add if we surface an option to select initial points? 3. I think we have a similar flow. A screenshot below. Notice you can change the raw data source node or the score data source node for any other data sets. Once you run the flow, click on the ellipsis for Exported data of your Score node. Browse the score partition and notice the new columns in your data set: segment id, distance, segment description, and segment variable. The cluster node assigns metadata roles that make sense, so that you don't use redundant information on subsequent models. If all you want to do is to find the clusters for a larger data set, you are good to go. If on the other hand, you want to use your large data set (the scored one) to train subsequent models (for example a model for each segment), you will need to add a SAS Code node and paste the below into the code editor. This code has two data steps. The first one creates a train data from your scored data set. The second one passes the metadata from your scored set. data &EM_EXPORT_TRAIN; set &EM_IMPORT_SCORE; run; data &EM_EXPORT_TRAIN_CMETA; set &EM_IMPORT_SCORE_CMETA; run; 4. I am not a programmer and it was relatively easy to pick up how SAS EM works. And just like the particular need you had for point 3, you can find a work around to do exactly what you want to do in SAS EM because it sits on top of base SAS. This means that 99% of the time, anything that you can do on base SAS you can do in EM. Something that really helped me understand EM was to go to my workspace folder (EMWS) and see what each node creates. When you run into trouble take a quick look to the "Getting started" section on the reference help, and feel free to post specific questions. Someone in this online community will point you in the right direction. Thanks, Miguel

M_Maldonado · ‎09-08-2014

Hi Ben, To give you some more input. 1. The three clustering methods you mention are the only ones supported on Enterprise Miner. 2. On Enterprise Miner you can specify a few seed initialization methods for the cluster node. But if you are looking to point a proc to a specific data set, you should use proc fastclus with the seed option (). 3. A quick way to cluster new observations is to run a cluster node on the subset of your data, and then use that flow to score a new data set (with a score role). I hope it is useful. Good luck! Miguel

M_Maldonado · ‎09-05-2014

Hi Ros, Rephrasing your question, you are saying that you look for monotonicity in your weight of evidence curves. If your input variable (factor) does not show monotonicity, what to do about it? To give monotonicity a priority try these two grouping mehtods: monotonic event rate and constrained optimal. If you are looking for a quick fix and you are absolutely sure that the WOE curve does not represent your data, or future data, you can override with manual WOE on the coarse detail options. Simply adjust the weight of evidence using the Manual WOE columns. This can help to deal with the scorecard points of the down payment variable you mentioned in your example, but remember that you are overriding the ratio of events/non-events according to your business knowledge, not to what is in your data. Quick fix example: Default grouping WOE vs manual WOE Monotonic event rate grouping Use this grouping option if you are looking for monotonic event rate. This option does a great job and it is really hard to beat if you try to come up with the groupings on your own. It will save you a lot of time. But only use it if monotonicty is really important for all your variables. Some variables are expected to have a linear trend, while others are expected to have an inverted U curve. Be careful to only impose this constraint when it makes sense. For this particular example, notice that IV decreases to 0.10. The default grouping had a higher IV value, which means that the previous grouping was somehow more useful. But if monotonicity is very important for this variable, this is the way to go. Constrained optimal grouping A third option grouping method is constrained optimal. This method has an OR approach and it will impose several constraints (you can enable/disable/modify them through the constraint options or advanced constraint options menus). Monotonicity has a priority on those constraints. This grouping is my personal favorite because I can choose which constraints should be applied to which variables using the advanced constraint option menu. Below the results for the same variable in our example using the constrained optimal grouping. Notice that it is a nice answer because these groups have a monotonic WOE trend while they also represent very well the event count of the graph on the left. I also find four groups more useful for this variable in a scorecard. Finally, even if I decided to adjust these WOE based on business knowledge for a more linear trend (and more differentiated scorecard points), the manual adjustments would be minor compared to the manual adjustment of the "quick fix" example. For more details on these options, check the reference help (press F1 key when you are on Enterprise Miner). Huge favor, please don't forget to rate this answer and to comment back on how these grouping options work for you. And this thread may open up a deeper discussion of when should WOE monotonicity be given a priority, which deserves a whole new thread on its own. Thanks, Miguel

M_Maldonado · ‎09-05-2014

Hi Ben, Thanks for your detailed questions! 1. The three clustering methods you mention are the only ones supported on Enterprise Miner. 2. On Enterprise Miner you can specify a few seed initialization methods for the cluster node. But if you are looking to point a proc to a specific data set, you should use proc fastclus with the seed option (). 3. A quick way to cluster new observations is to run a cluster node on the subset of your data, and then use that flow to score a new data set (with a score role). 4. The best place to find detailed explanations is right on Enterprise Miner. Go to the contents icon, or press F1. This will open the reference help. For example, I punched F1 on EM and searched for "Cubic clustering criterion". I got more than 25 pages of detailed explanation... Brief excerpt below of the abstract and intro. I hope this helps for now. Thanks, Miguel Cubic Clustering Criterion Abstract The cubic clustering criterion (CCC) can be used to estimate the number of clusters using Ward's minimum variance method, k -means, or other methods based on minimizing the within-cluster sum of squares. The performance of the CCC is evaluated by Monte Carlo methods. Introduction The most widely used optimization criterion for disjoint clusters of observations is known as the within-cluster sum of squares, WSS, error sum of squares, ESS, residual sum of squares, least squares, (minimum) squared error, (minimum) variance, (sum of) squared (Euclidean) distances, trace(W), (proportion of) variance accounted for, or R 2 (see, for example, Anderberg 1973; Duran and Odell 1974; Everitt 1980). The following notation is used herein to define this criterion: n number of observations n k number of observations in the k th cluster p number of variables q number of clusters X n by p data matrix q by p matrix of cluster means Z cluster indicator matrix with element Z ik = 1 if the i th observation belongs to the k th cluster, 0 otherwise. Assume that without loss of generality each variable has mean zero. Note that Z'Z is a diagonal matrix containing the n ks and that The total-sample sum-of-squares and cross products (SSCP) matrix is T = X’X. The between-cluster SCCP matrix is The within-cluster SSCP matrix is The within-cluster sum of squares pooled over variables is thus trace(W). By changing the order of the summations, it can also be shown that trace(W) equals the sum of squared Euclidean distances from each observation to its cluster mean. Since T is constant for a given sample, minimizing trace(W) is equivalent to maximizing which has the usual interpretation of the proportion of variance accounted for by the clusters. R 2 can also be obtained by multiple regression if the columns of x are stacked on top of each other to form an np by 1 vector, and this vector is regressed on the Kronecker product of z with an order p identity matrix. Many algorithms have been proposed for maximizing [untitled graphic] or equivalent criteria (for example, Ward 1963; Edwards and Cavalli-Sforza 1965; MacQueen 1967; Gordon and Henderson 1977). This report concentrates on Ward's method as implemented in the CLUSTER procedure. Similar results should be obtained with other algorithms, such as the k-means method provided by FASTCLUS. The most difficult problem in cluster analysis is how to determine the number of clusters. If you are using a goodness-of-fit criterion such as R 2 , you would like to know the sampling distribution of the criterion to enable tests of cluster significance. Ordinary significance tests, such as analysis of variance F- tests, are not valid for testing differences between clusters. Since clustering methods attempt to maximize the separation between clusters, the assumptions of the usual significance tests, parametric or nonparametric, are drastically violated. For example, 25 samples of 100 observations from a single univariate normal distribution were each divided into two clusters by FASTCLUS. The median absolute t-statistic testing the difference between the cluster means was 13.7, with a range from 10.9 to 15.7. For a nominal significance level of 0.0001 under the usual, but invalid, assumptions, the critical value is 3.4, yielding an actual type 1 error rate close to 1. The first step in devising a valid significance test for clusters is to specify the null and alternative hypotheses. For clustering methods based on distance matrices, a popular null hypothesis is that all permutations of the values in the distance matrix are equally likely (Ling 1973; Hubert 1974). Using this null hypothesis, you can do a permutation test or a rank test. The trouble with permutation hypothesis is that, with any real data, the null hypothesis is totally implausible even if the data does not contain clusters. Rejecting the null hypothesis does not provide any useful information (Huber and Baker 1977). Another common null hypothesis is that the data are a random sample from a multivariate normal distribution (Wolfe 1970, 1978; Lee 1979). The multivariate normal null hypothesis is better than the permutation null hypothesis, but it is not satisfactory because there is typically a high probability of rejection if the data is sampled from a distribution with lower kurtosis than a normal distribution, such as a uniform distribution. The tables in Englemann and Hartigan (1969), for example, generally lead to rejection of the null hypothesis when the data is sampled from a uniform distribution. Hartigan (1978) and Arnold (1979) discuss both normal and uniform null hypotheses, and the uniform null hypothesis seems preferable for most practical purposes. Hartigan (1978) has obtained asymptotic distributions for the within-cluster sum of squares criterion in one dimension for normal and uniform distributions. Hartigan's results require very large sample sizes, perhaps 100 times the number of clusters, and are, therefore, of limited practical use. This report describes a rough approximation to the distribution of the R 2 criterion under the null hypothesis that the data have been sampled from a uniform distribution on a hyperbox (a p-dimensional right parallelepiped). This approximation is helpful in determining the best number of clusters for both univariate and multivariate data and with sample sizes down to 20 observations. The approximation to the expected value of R 2 is based on the assumption that the clusters are shaped approximately like hypercubes. In more than one dimension, this approximation tends to be conservative for a small number of clusters and slightly liberal for a very large number of clusters (about 25 or more in two dimensions). The cubic clustering criterion (CCC) is obtained by comparing the observed R 2 to the approximate expected R 2 using an approximate variance-stabilizing transformation. Positive values of the CCC mean that the obtained R 2 is greater than would be expected if sampling from a uniform distribution and therefore indicate the possible presence of clusters. Treating the CCC as a standard normal test statistic provides a crude test of the hypotheses: H 0 : the data has been sampled from a uniform distribution on a hyperbox. H a : the data has been sampled from a mixture of spherical multivariate normal distributions with equal variances and equal sampling probabilities. Under this alternative hypothesis, R 2 is equivalent to the maximum likelihood criterion (Scott and Symons 1971).

M_Maldonado · ‎08-29-2014

It is very weird that you have it licensed but not installed... Please contact tech support with this form: Technical Support Form. Once you get Enterprise Miner working, don't forget to come back and post on how your self learning is going. Take care, Miguel

M_Maldonado · ‎08-29-2014

Hi D, What do you see in your SAS Home directory? In my machine SASHome is in: C:\Program Files\SASHome\ There I can find SASEnterpriseMinerClient\13.2 and then the Enterprise Miner icon. Does that work for you? Cheers, Miguel

M_Maldonado · ‎08-14-2014

HI Sebastien, Have you figured out what you needed? Not sure if I understand your question OK. You can find 9 clusters using the Cluster node using the variable country as input. You can use country as a segment variable already, but go this path if that is more useful to you. Then I would connect the cluster node to a subflow Start Groups (Mode Index to pick up the segments of your cluster node)->TS Simmilarity (please confirm your data is in the right format)->End Groups I hope that helps, Miguel

M_Maldonado · ‎08-01-2014

Please send me the log and I will take a look. Miguel M Maldonado at sas dot com

M_Maldonado · ‎07-31-2014

Omer, Very weird...I have never seen anything like that. What happens when you click on the TABLE icon (4th icon left to right on Model Comparison node results)?

M_Maldonado · ‎07-22-2014

Hello Suzanne, You can achieve this using either Prior Probabilities or Decisions. There is a section for that on the chapter Predictive Modeling of SAS EM Reference Help. Access it by pressing F1 on Enterprise Miner. Or download a pdf version through this link SAS Enterprise Miner. I hope it helps, Miguel

Online Status	Offline
Date Last Visited	‎02-28-2018 11:39 AM

Re: Unbalanced data - miner

Re: SAS EM only: How to use parameter estimates in the next node?

Re: StatExplore Node

Re: How many leaves and nodes should a tree

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: run time error ensemble model

Re: run time error ensemble model

Re: help with hash table

Re: help with hash table

Re: StatExplore Node

Re: How to access Variable importance in neural network in EM?

Re: Grouping variables to create new variables SAS Enterprise Miner

Re: Error when running market basket node in SAS EM

Re: Seed Initialization Method for Hierarchical Clustering

Re: How can I use the Tobit's Model in SAS?

Re: Using cross-validation in Enterprise Miner;

Re: How come no Segment Profile after I set "Cluster Variable Role" = ...

Re: Confusion matrix in Enterprise Miner

Re: How can we export dataset from enterprise Miner as a csv file or t...

Credit Scoring by Example in SAS® Enterprise Miner™

Tip: How to model a rare target using an oversample approach in SAS® ...

Tip: How to interpret your SAS® Rapid Predictive Modeler results

Tip: Use the Cutoff Node in SAS® Enterprise Miner™ to Consume the Post...

Tip: How to build a scorecard using Credit Scoring for SAS® Enterprise...

Re: Final Bagged Decision Tree from group processing?

Re: Final Bagged Decision Tree from group processing?

Re: Final Bagged Decision Tree from group processing?

Re: Confusion matrix in Enterprise Miner

Re: Capping and Flooring outliers - Methods

Re: SAS Miner Clustering options

Re: SAS Miner Clustering options

Re: How to manage factors which have IV <0.1,but the score are fluctua...

Re: SAS Miner Clustering options

Re: SAS Enterprise Miner - Self learning

Re: SAS Enterprise Miner - Self learning

Re: HowTo : Clusterize time series according to groups ?

Re: Decision Tree Node creating no D_TARGET outcome

Re: Decision Tree Node creating no D_TARGET outcome

Re: How to weight records in the desicion tree node SAS Enterprise Min...