About M_Maldonado

RyDan · ‎11-20-2018

Found out some good information on this here https://blogs.sas.com/content/tag/ai-interpretability/?utm_source=LinkedIn&utm_medium=social-voicestorm&utm_content=6ee61e79-c47b-4192-a8c0-41465d86c3e1

Cheatan_478 · ‎03-07-2018

@chemicalab Can you please help me with the code on how we can get the classification table for validation data. Below is my code. proc logistic data=training outmodel=newsModel; class sex; model subs=sex age / link=glogit ctable outroc=d; score data=training out=Score2; run; proc logistic noprint inmodel=newsModel; score data=validation out=ScoredTest; proc print data=ScoredTest label noobs;

M_Maldonado · ‎02-28-2018

Hi Nicolas, Maybe this thread can help you while someone takes a second look into what you did? https://communities.sas.com/t5/SAS-Data-Mining-and-Machine/Oversampling-in-Enterprise-Miner-with-a-rare-event-fixed/td-p/161991 When I oversample, I usually test the model on a hold-out test data set that I saved somewhere else and didn't use for modeling. That gives me some confidence that I didn't fool myself 🙂 Would that be an option for you? Best, -Miguel

mnsen · ‎02-27-2018

Hi RandyMullis, I have the same problem and I am using the SAS OnDemand for Academics? Thanks in advance

DougWielenga · ‎01-31-2018

I wonder what the Miner does without a decision node. My first try was to define prior probabilities in my sample node. After that I ran a data partition, different decision trees and a model comparison node. Do prior probabilities defined in the sample node - without having a decision node - effect the following nodes or not? badikidiki @badikidiki, I'm not sure what option you were using when you tried to adjust the probabilities using the Sample node. You can only define priors and weights using the Decisions property in the Input Data Source node or you can do so in the Decisions node itself. In general, I recommend doing so in the Input Data Source node. In either case, you can choose options whether to adjust the probabilities based on the priors you specified, and you can choose whether to use the decisions weights you specified. By default, the decision weights correspond to assigning each observation to the most likely outcome (all outcomes are equally weighted before it is changed). If you are using a Sample node, I would avoid using the Sample node property to Adjust Frequency which could make the actual sample look much larger than it is. If you specified decision priors/weights in the Input Data Source node (or in the Decisions node), every subsequent node might be impacted. If you choose the option to Adjust Frequency in the Sampling node, you will likely see a different impact, but all adjustments are being considered together. I would recommend that if you plan on sampling, specify priors and weights that correspond to your anticipated sample. Hope this helps! Doug

WendyCzika · ‎12-19-2017

When you have the project selected in the project panel, click on the "..." for Project Macro Variables, see screenshot.

rbhadra · ‎12-14-2017

Hi, Were you able to solve this problem? I am facing the exact same issue while working with GBM

DougWielenga · ‎12-05-2017

Depending on the scenario, you might want to identify a cutoff probability of interest knowing that this will (likely) cutoff differently sized proportions of each data set it scores. If your goal is to predict the top 20% of the scored observations as responders regardless of their actual predicted probabilities, the easiest way is to use the RANK procedure with the GROUPS= option to create 5 bins based on the predicted probability. The first or last group (depending on sort order) corresponds to the top 20%. If you scored a data set with ROLE=SCORE using a Score node in SAS Enterprise Miner, you could connect a subsequent SAS Code node and use the following code assuming a categorical target: /*** BEGIN SAS CODE ***/ libname mylib 'C:\data'; * define path to where you will write out your data; proc rank data=&EM_IMPORT_SCORE out=myranks groups=5 descending; * identify 5 groups based on predicted probability; var EM_EVENTPROBABILITY; ranks MyRankVar; run; proc freq data=myranks; * crosstabs of rank variable by actual target level; tables MyRankVar * %EM_TARGET / nocol nopercent; run; data mylib.MyScores; * flag those in the top 20%; set myranks; Top20=.; if MyRankVar gt 0 then Top20=0; else Top20=1; run; proc freq data=mylib.MyScores; * verify you have flagged the right group; tables MyRankVar*Top20 / norow nocol nopercent; run; proc means data=mylib.MyScores; * calculate statistics on predicted probability grouped by Top20; var EM_EVENTPROBABILITY; class Top20; run; /*** END SAS CODE ***/ It is possible you will not need the DESCENDING option in the RANK procedure. Also, the EM_EVENTPROBABILITY variable is added by the Score node so you will need to modify the code to identify the variable containing the prediction probability for the target event if you do not score using the Score node. Hope this helps! Doug

DougWielenga · ‎11-27-2017

The Misclassification rate can only be calculated once a prediction is made for a categorical target variable. When building a model against a categorical target variable, SAS Enterprise Miner will calculate the probability of each level of the target. The choice for the predicted level then depends on what you have used to choose a predicted target level. By default, SAS Enterprise Miner will predict the most likely outcome for each observation and store in it a variable of the form I_< target variable name> ; however, SAS Enterprise Miner can also incorporate Decision weights which can are used along with the predicted probability of each level to calculate an expected value of each decision. In this latter case, it chooses the most profitable (or least costly) decision and stores the resulting level in a variable of the form D_<target variable name>. Suppose you have a binary target variable Purchase which equals "1" or "0" for each observation. SAS Enterprise Miner will compute P_Purchase1 = probability that Purchase = "1" P_Purchase0 = probability that Purchase = "0" I_Purchase = most likely outcome based the greater of P_Purchase1 and P_Purchase0 but it might also compute D_Purchase = most "valuable" outcome after computing the product of the probability of each outcome and the associated decision weight when you have specified a decision profile and clickec the radio button for Yes under "Do you want to use the decisions" in the Decision Processing dialog available from the Input Data Source node or a Decisions node. As a result, the answer to your question depends on whether you had specified and used decision weights or not. Hope this helps! Doug

Mike90 · ‎11-17-2017

Outside of the "inest=" issue, there is a lot of good information in your post. Thanks for posting.

AnnaBrown · ‎09-25-2017

Hi camontanezp, Welcome to the SAS Data Mining and Machine Learning Community, thanks for your question! Since it's been quite a while since Miguel and Ivan's discussion, I recommend opening a New Message with your question and reference this thread in that post. Best, Anna

MariaD · ‎09-05-2017

Hi @butch_cruz, I know this error happened long time ago, but I'm having the same situation rigth now. SAS Technical Support can help you with this matter? Could you please share the solution? Regards,

DougWielenga · ‎07-26-2017

There are many ways to identify important variables including multiple options in the Variable Selection node depending on the measurement level of your target variable. If the variables that have been identified are not performing well, there could be many possible reasons contributing to the problem such as... ... limited information in the predictor variables ... poorly conditioned input variables (perhaps a transformation of the variables would perform better) ... mismatch between the selection method and the modeling method (e.g. it does not necessarily make sense to use a regression based linear variable selection technique when passing variables to a non-linear modeling algorithm like a Tree or Neural Network) ... lack of sufficient target signal (e.g. if you are modeling a rare event, it is possible that variables are being missed due to the criteria you are using for selecting them in which case oversampling and/or considering decision weights/priors might be of help) ... lack of model flexibility (e.g. using a regression without considering the possibility of higher order terms/interactions and/or considering more flexible modeling strategies) In general, I strongly advocate using several different variable selection strategies including using multiple Variable Selection nodes with different settings and Decision Tree nodes to create a superset of possibly useful input variables. Depending on the model, further selection might be possible. Note that Decision Trees automatically select variables, Regression approaches optionally can use selection methods, and Random Forest models build Trees from subsets of variables as well as subsets of observations. Making sure you have not overly restricted the input variables but have considered possibly helpful binning and/or numeric transformations and are using sufficiently flexible modeling methods should help you to obtain the best possible predictions based on your data.

DougWielenga · ‎07-14-2017

There can be several reasons why a node is caused to rerun, and note that it sometimes only appears to rerun. Anytime you run a flow, SAS Enterprise Miner will check the preceding nodes to see if they need to be rerun. Even if they do not, the node might show a spinning green circle while it is being checked. You can only know if a node has actually rerun if you check the timestamp next to Last Run Time in the node property sheet. You have the ability to Create Grouping Data and can then later can Import Grouping Data by specifying the corresponding properties in the property sheet. Once the node has been run, you can also freeze the groupings. If the problem is intermittent, make sure you aren't either running into disk space issues or java memory or getting disconnected from your data source. SAS Enterprise Miner relies on views that require access to the data so losing the connection to the database keeps it from functioning as expected.

DougWielenga · ‎07-10-2017

ajosh, Modeling rare events (which is actually quite common) is often challenging for several reasons: * The null model is highly accurate (2% response rate means any model assigning all to the nonevent is 98% accurate) * Failing to put any additional weight on correctly predicting the rare event can lead to a null model (for the reasons above) * Increasing the weight on correctly predicting the rare event results in picking far more observations having the event than actually do It might be helpful to separate the tasks of modeling an outcome and taking action on the outcome. When modeling a rare event, you must often either oversample the rare event, add weight to correctly predicting the rare event, choose a model selection criteria that is not based on the classification, or some combination of these. For reason stated above, misclassification is typically not a good selection criteria for modeling. SAS Enterprise Miner always provides a classification based on which outcome is most likely. When a target profile is created and decision weights are employed, SAS Enterprise Miner will also create variables containing the most profitable outcome based on the target profile you created. The meaningfulness of that prediction is directly related to the applicability of the target profile weights. In general, modeling itself is more clear cut in that each analyst can pick and choose their criteria for building the 'best' model and then build the model. The resulting probabilities can then be used to order the resulting observations. Unfortunately for decision tree models, all of the observations in a single node are given the same score which is why some people run additional models within each terminal node to further separate the observations. The choice of what to do with the ordered observations typically involves business decisioning. The choice to investigate fraud can be costly, particularly if the person investigated is an honest loyal customer who just had an unusual situation. The amount of money at stake, the customer's longevity/profitability with the business, and the future expected value of the customer are just a few things that might be considered. This business decisioning usually creates far more complex criteria than can be simplified to a misclassification matrix which does not take the amount of money at risk into account. Simply put, whether you take the default decision based on the most likely outcome (typically inappropriate in a rare event), use the decision-weighted predicted outcome (assuming the decision profile accurately represents the business decisioning), or use some other strategy for selecting cases to investigate (based on available resources, amount at risk, likelihood of fraud, etc...), the TP and FP come from the strategy you employ. I clearly advocate business decisioning in determining how to proceed because the simple classification rate itself is not meaningful enough in rare events. Even looking at the expected value of money at risk (e.g. the product of the probability of fraud and the amount at risk) will yield a different ordering of observations. So there isn't a great answer to the question which cutoff to use without fully understanding the business objectives and priorities. I tend to use some oversampling (but not to 50/50 because it under-represents the non-event) and decision weights with priors to allow variable selection and to get reasonable probabilities but then combine those probabilities with other information to determine the final prioritization/action for observations based on some more complex rules.

Online Status	Offline
Date Last Visited	‎02-28-2018 11:39 AM

Re: Unbalanced data - miner

Re: SAS EM only: How to use parameter estimates in the next node?

Re: StatExplore Node

Re: How many leaves and nodes should a tree

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: Export scoring code for Cross Validation in SAS Enterprise Miner

Re: run time error ensemble model

Re: run time error ensemble model

Re: help with hash table

Re: help with hash table

Re: StatExplore Node

Re: How to access Variable importance in neural network in EM?

Re: Grouping variables to create new variables SAS Enterprise Miner

Re: Error when running market basket node in SAS EM

Re: Seed Initialization Method for Hierarchical Clustering

Re: Using cross-validation in Enterprise Miner;

Re: How come no Segment Profile after I set "Cluster Variable Role" = ...

Re: Confusion matrix in Enterprise Miner

Re: How can we export dataset from enterprise Miner as a csv file or t...

Re: How many leaves and nodes should a tree

Credit Scoring by Example in SAS® Enterprise Miner™

Tip: How to model a rare target using an oversample approach in SAS® ...

Tip: How to interpret your SAS® Rapid Predictive Modeler results

Tip: Use the Cutoff Node in SAS® Enterprise Miner™ to Consume the Post...

Tip: How to build a scorecard using Credit Scoring for SAS® Enterprise...

Re: Partial Dependence Plot for boosting decision tree

Re: Classification table (CTABLE) for validation set in proc logistic

Re: Unbalanced data - miner

Re: text mining

Re: Where to define prior probabilities?

Re: Error message for Transform Variables

Re: EM Gradient Boosting unable to produce a model

Re: EM: Setting cutoff for predictive model

Re: misclassification rate sas miner

Re: SAS EM only: How to use parameter estimates in the next node?

Re: Grid search to optimize parameters?

Re: SAS Eminer Error

Re: what is the optimal way to use variable selection node

Re: Enterprise Miner Interactive Grouping Node resets itself...

Re: Using Cut Off Node and Interpreting Predicted Probabilities.