About jeremyyuan

jeremyyuan · ‎08-14-2017

Thanks so much, Doug and Padraic for your very helpful explanations and opinions!!

jeremyyuan · ‎08-09-2017

Thanks Doug! That helps me a lot in understanding the logic behind Decision Trees module in EM. Based on what you said, I notice I have a misunderstanding: I thought after I adjusted the prior probability to 0.02 vs 0.098, then Decision Trees module will process the oversampled data and finally assign the probability based on the actual 0.02 vs 0.98 automatically at the final stage and would NOT “be predicted to occur more often than they actually do”. But actually in this way, it is still predicted to occur more often than they actually do. I heard (but not saw ) that if using oversampled data, the adjustment should be made during scoring stage, not the Decision Trees modeling stage, by using some certain formula. Do you have any idea? My purpose is to need low misclassification rate, I will use the actual data file to try and see which assessment measurement is fit. Thanks a lot! Jeremy

jeremyyuan · ‎08-08-2017

I have a data with a rare event (2% bads and 98% goods), so I oversampled the data with 30% bads and 70% goods and let EM read in this oversampled data. After the data was read in, I set the Adjusted Prior to 0.02 vs 0.98 in Prior Probabilities tab. Then I set 60% train and 40% as validation in partition. After that, I entered my 10 variables in Decision Tress model and in the property panel under Subtree Section, there is a Assessment Measure, in which I chose Decision (which is a default). Then the results came out with several nodes and it looked as I expected. Then I changed the Assessment Measure to Misclassification and ran again, this time no nodes came out (no error message), just the root nodes! Then I changed my strategy. I let EM read the data without setting the Adjusted Prior. So EM read the data as 30% bads and 70% goods and the partition was the same. I ran this data by Decision Trees. When I set Assessment Measure as Decision (which is default), there was no sub-node(just root node as before). But when I changed to Misclassification, the results came out with several nodes, which was similar to that above with Decision choice in Assessment Measure. You see the different choices in Assessment Measure led to the opposite results. My data has no missing values and my target is bainary(1/0) and my data has no profit/loss information. My puzzles here are: 1) Do I need to set the Adjusted Prior to 0.02 vs 0.98? 2) if yes to the first question, then is Misclassification in Assessment Measure a correct choice? 3) If yes to the second question, why no results cam out, while using Decision, expected results came out? Very much appreciated any tips for my puzzle! Thanks a lot!! Jeremy Yuan Using EM 13.2 on web-base client

Online Status	Offline
Date Last Visited	‎11-01-2018 10:26 AM

Re: Enterprise Miner: Decision Trees - how to choose Assessment Measur...

Re: Enterprise Miner: Decision Trees - how to choose Assessment Measur...

Enterprise Miner: Decision Trees - how to choose Assessment Measure in...

Re: Enterprise Miner: Decision Trees - how to choose Assessment Measur...

Re: Enterprise Miner: Decision Trees - how to choose Assessment Measur...

Enterprise Miner: Decision Trees - how to choose Assessment Measure in...