About M_Maldonado

M_Maldonado · ‎11-03-2015

What node are you using? Here a link to some explanation for the Generalized Wald Chi Square from proc surveyselect. http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_surveyfreq_details64.htm I hope it helps! -M

M_Maldonado · ‎10-31-2015

Someone explained this to me as the way Text Miner is telling you that it is ignoring the parsing for a term. But since I don't remember reading it from the doc, I hope someone will call me out if I am talking nonsense . For example, let's say you are doing text mining on text about sports. Your parsing will identify several terms like "soccer player", "hockey player", "seasonal player", "terrible player", "national player", "outstanding player", etc. Your text mining algorithm will keep the terms that are relevant. However if none of them are relevant by themselves, the algorithm will merge them into a new topic "+ player". The "+" indicates that the original terms were something+player, but since none of them were relevant by themselves, they were combined into a new term different than the original parsed terms. I hope this helps! -Miguel

M_Maldonado · ‎10-28-2015

Hi, Where are these data sets? I should know this, right? I could not find these data sets in my go-to libraries (sashelp and sampsio) but I vaguely remember them from the course Advanced Analytics Using SAS Enterprise Miner. What I do most of the time when I want to find more about a data set is to google the names of the variables. For example for the Home Equity data set (sampsio.hmeq) if you google "LOAN DELINQ LOAN MORTDUE data set", you find a link to a description of the repository that first published this (or a similar) data set. There might be a better way to do this, but this might get you started while someone posts the ultimate way to find better descriptions. As an alternative, you can contact the instructor for the class if this is where you get the data from. Good luck! -Miguel

M_Maldonado · ‎10-23-2015

Hi, I rarely use the File Import node, but I think the Explore properties are the same for all the nodes. When I do data exploration in Enterprise Miner, I bump up two settings. 1. Click on your project name to see its properties. Your project is on the top left panel right on top of Diagrams, Data Sources, and Model packages menus. Next, click on the ellipsis for Project macro variables. Change EM_ExploreOBS_MAX to be a higher number. 2. Go to Options-> Preferences. Specify Sample Method as Random and Fetch Size as Max. Now Enterprise Miner will pick up the Max that you specified on the previous step. I hope this helps! -Miguel

M_Maldonado · ‎10-22-2015

Hi Rogelio, I could not reproduce this behavior. Did you get this to work? This is what I did and it went OK. 1. I created a data source Mydata_A, and ran a decision tree. 2. Next I used base SAS to add 5 more columns to Mydata_A. Then I went to Enterprise Miner to drag-and-drop the data source again. The advisor took care of the roles and levels of the extra columns, but to do this the proper way I could have refreshed the metadata. You can also delete this data source, and create it again. 3. I connected a decision tree node, and specified Import Tree Model as Yes, and selected the tree set (in this example EMWS1.tree_emtree). 4. Run Is this what didn't work for you? Thanks, Miguel

M_Maldonado · ‎10-19-2015

Hey Eric, What does your data look like? For each campaign your target is a binary target? E.g. either campaignA=1 or campaignA=0. If you have variables like that for each campaign (campaignA, campaignB, campaignC, etc), you can write SAS code to create a new nominal target. Example, for each observation campaign equals a value in {A,B,C,...}. You can then train a decision tree that classifies each person to one of those campaigns. Does that help? Thanks,

M_Maldonado · ‎10-15-2015

Hi Pritish, Are you using SAS Enterprise Miner? If I understand well, you don't want to create a decision tree model, but rather to determine the optimal split on each of your 300+ variables? If the end goal is optimal binning using a decision tree, you can do that using the Transform node with option set to optimal binning. If you are doing something else, you need to create a macro that calls proc arbor passing one variable at a time. Give it a try creating your macro, happy to help if you have questions. Arbor doc in case it helps: http://support.sas.com/documentation/onlinedoc/miner/em43/allproc.pdf Good luck, M

M_Maldonado · ‎10-11-2015

Hi Kevin, You can define the role of a variable as Frequency as part of the metadata. The easiest way to do it is during step 5 (create column metadata) of the Data Source Wizard to create a data source. Other two alternatives: -use the Metadata node in your diagram flow -if you have already created your data source, click on it under "Data Sources", then click on the ellipsis for Variables, and modify the role of your frequency variable. Good luck! -Miguel

M_Maldonado · ‎10-09-2015

One of your nominal inputs has a lot of levels (more than 512) and your transform node wants you to take care of that. Quick solution: use HPTransform node instaed. This node does not have that extra check. Another alternative, bin your nominal (or ordinal) inputs to compact your levels. Good luck!

M_Maldonado · ‎09-24-2015

Hi nit, please expand a little bit. What problem are you trying to solve? What kind of data are you dealing with?

M_Maldonado · ‎09-23-2015

As usual, Gergely has good advice! Another way to assign cost or profits is to use the Decisions node in the assessment tab. I hope this helps!

M_Maldonado · ‎09-23-2015

In the context of data mining or predictive modeling you care about how accurate are your predictions. I would look at misclassification and ROC index if I am predicting a binary or nominal target, and at average square error for any other target or response. I hope it helps!

M_Maldonado · ‎09-22-2015

Hi FO, Weight of evidence is different from the bad rate. Considering that an event is an applicaiton flagged as credit default or "bad", and a non-event is a non-default or "good" application, we have that: WOE is the log transformation of the percentage distribution of events to non-events, or in other words the log(event_i%/non-events_i%), where the ith % stands for the percentage of events or nonevents in that bin, relative to the total number of events or non-events. Bad rate or event rate is the rate calculated as events/(events+non_events). For certain cases you might want to reverse the woe lines by changing the order of your binary target from descending to ascending, but just make sure that your scorecard makes sense in terms of the higher the score, the lower the odds of an event happening. Another setting that might come handy is the option for reverse scorecard in the properties for the Scorecard node. I hope this helps! -Miguel

M_Maldonado · ‎09-11-2015

How similar are the training and the scoring data sets? Do both have the same input variables? Are the distributions similar for the input variables across both data sets? Try looking at the score code (if it is not too large) and identify why all your observations go to the same cluster.

M_Maldonado · ‎09-11-2015

Hi Zach, Thanks for including the screenshot and the log. That sure helped! In general, you don't want to use Model Comparison node to compare the fit statistics of models that you trained on different data sets. There might be some special cases when you do want to combine the posterior probabilities of models trained on different data sets, for example when you are building a special type of ensemble model. But that's another conversation. Quick fix: Copy-paste the subflow Model Comparison->Score->Reporter two more times. Connect each of your HPForest Models to one of those subflows, run it, and you will have a Reporter that explains the Variable Importance of each of your HPForest models. Remember, this report is using a decision tree to explain the main drivers of a model. Why you got this error? From your log, it looks like the reporter node knows what variables in the metadata are used as inputs. It errored out when one of the data sets did not have two of those inputs (ULTIMATE_LITIGATION AND ULTIMATE_RTW). I am not sure if those to input variables were not in one of the data sets from the get-go or if they were not passed. Anyway, the suggested quick fix should get you what you need (except if the decision tree finds no rules, but that would only happen if no inputs drive your predicted target. As long as you see the variable importance chart on your pdf report and the log says something like "NOTE: The data set WORK.RULES has XXX observations and YYY variables.", everything is good.). About Scoring your HPForest model In short, the good news is that the Score node writes the SAS code you need to score new observations with your HPForest model. Open the Score node that you ran in your subflow and you will see the scoring code. HPForest is a special case that uses a specific proc called hp4score to score new observations. The reason to do this through a proc is because traditional SAS code would take a lot of time to read, to write, and it would be a really big file (remember that your hpforest combines hundreds of a special type of trees). Let me elaborate on the tradeoff of predictability vs explainability. For example let's compare a single decision tree with an hpforest. As a model, it is really easy to explain. From the tree diagram or from the score code you can come up with the set of rules that classify an observation as a predicted event (e.g. if X; or if X and Y; or if X,Y, and Z). But you cannot do the same for an hpforest. Even if you came up with the huge list of rules, you still need to average them. Interpreting a forest is really hard unless you do a workaround like the reporter node, which uses a single decision tree to explain the predicted outcome using the inputs of your hpforest model. I hope this helps! M

Online Status	Offline
Date Last Visited	‎02-28-2018 11:39 AM

Re: Unbalanced data - miner

Re: SAS EM only: How to use parameter estimates in the next node?

Re: StatExplore Node

Re: How many leaves and nodes should a tree

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: run time error ensemble model

Re: run time error ensemble model

Re: help with hash table

Re: help with hash table

Re: StatExplore Node

Re: How to access Variable importance in neural network in EM?

Re: Grouping variables to create new variables SAS Enterprise Miner

Re: Error when running market basket node in SAS EM

Re: Seed Initialization Method for Hierarchical Clustering

Re: Using cross-validation in Enterprise Miner;

Re: How come no Segment Profile after I set "Cluster Variable Role" = ...

Re: Confusion matrix in Enterprise Miner

Re: How can we export dataset from enterprise Miner as a csv file or t...

Re: How many leaves and nodes should a tree

Credit Scoring by Example in SAS® Enterprise Miner™

Tip: How to model a rare target using an oversample approach in SAS® ...

Tip: How to interpret your SAS® Rapid Predictive Modeler results

Tip: Use the Cutoff Node in SAS® Enterprise Miner™ to Consume the Post...

Tip: How to build a scorecard using Credit Scoring for SAS® Enterprise...

Re: Proc Logistic Wald Chi-Square Calculation in EM, Please Help. Than...

Re: text miner

Re: SAS Datasets - background info

Re: File Import in SAS Enterprise Miner

Re: Miner decision tree error message

Re: Multiple Campaigns with Multiple Decision Trees

Re: eminer decision tree

Re: Frequency Weights in Enterprise Miner

Re: Error message for Transform Variables

Re: how to select target variables in predictive modelling

Re: use Confusion matrix as the evaluation criterion in Enterprise Min...

Re: Linear regression vs nonlinear regression performance measurement

Re: WOE in miner 7

Re: Scoring New Data On Enterprise Miner (Clustering Node)

Re: Enterprise Miner Cutoff Node & Intepretation