About ajosh

ajosh · ‎05-08-2015

Hi All, I am doing an exploratory factor analysis on multiple dichotomized variables (around 175 odd) for which I have few questions: Additional comment: I have to iterate (choosing different number of factors to be extracted each time) the factor analysis to find the best model. 1) For some set of independent variables, the correlation matrix was non positive definite, because of which the KMO and sphericity tests were not performed. Is there a workaround to avoid/overcome this? 2) I would like to know on how to identify the best model given say 5 iterations can be found out. Is there a statistic that helps here? 3) Any pointers on which is the best method for extraction and rotation (or is it to be determined by trial and error?)? Thanks for your comments/suggestions in advance. Regards, Aditya.

ajosh · ‎11-02-2014

Hi All, I am in process of forming a methodology for identification of infrequent aka suspicious association rules through SAS Eminer. The rules should be exclude frequent or strong rules as well as rare rules. Rare rules can occur in case of one off purchases say between a dealer dealing in manufacturing textiles and purchasing some relevant machinery. Few tweaks that I have done to default settings are: 1) set the lower min support and 2) export close to 0.1 million rules as a part of the output. Methodology used: 1) use a filter for extracting low confidence through user discretion, 2) use a metric which uses confidence of rule divided by product of support of LHS and RHS, 3) sort the low confidence rules on ascending order of the above metric and select say top 30 to 40% of these rules. Would like to ask the following questions: 1) is this methodology useful to find infrequent rules, 2) will an additional filter for low support prior to selecting low confidence be of any use or is redundant? Note that we can give user univariate and bivariate statistics on support, confidence and both taken together. Also a cascading feature can be implemented to enable users know what unique values of key metrics can be possible, basis the filters mentioned above. Thanks in advance for your suggestions and look foward to hear from you. Regards, Aditya.

ajosh · ‎06-19-2014

Hi Miguel, Please find attached the precision recall curve and cut off graphical outputs. Do let me know your views on the same. Thanks, Aditya.

ajosh · ‎06-19-2014

Hi Miguel, Thanks for clarifying the difference between standard boosting and gradient boosting. This is somehow not apparent in the SAS EMiner Help Menu, hence wanted to check the same for my understanding. Referring to the example in my earlier post, I ran 2 models (LR and DT) seperately with 2 pairs of start and end group nodes followed by a model comparison, which selects boosted trees as the best model. No. of iterations was set to 20. The confusion about using cut off came from the following observation: Cut off node is then run after the model comparison node in the same diagram and the results were examined. The original/prior probabilities for Y:N are 2%:98% respectively. Hence at the cut off threshold of 0.02, I found that all the cases were either classified as true positives or false positives, with zero count for true negatives and false negatives. Also the TP rate (which is shown as a graph along with overall classification rate etc) at each cut off was 100% for about 0.3 or 0.4. Hence, I was asking for whether the cut off node is still to be used even when I am working with ensembles. Also, I havent visited any literature where ensembles followed by comparison node and cut off node was illustrated in detail. Do let me know your views on the same. And thanks a lot for your support to my queries, once again. Regards, Aditya.

ajosh · ‎06-15-2014

Hi Miguel, Appreciate your response to my queries from the earlier post. I am using 2 seperate start and end group combinations one each for decision tree and other for stepwise logistic regression. The model comparison selects boosted trees as the best model. The inherent mechanism of generating ensembles could be used to skip the usage of cut off node. One last question would be, what is the difference between boosting through start and end groups vis a vis gradient boosting, in simplest sense as far as possible? Thank you for your valuable comments once again. Regards, Aditya.

ajosh · ‎06-11-2014

Hi All, For a classification problem with skewed target class 2%:98% == Y:N, I am using some standard data manipulations (transformations, replacements etc) followed by regression node and decision tree, enclosed between the start and end groups. I have selected the boosting procedure for 20 iterations for both the modeling nodes. The model comparison now selects boosted trees as the best model. I have a doubt on whether cut off node still needs to be used post model comparison, given the fact that boosted trees work on minimizing the error rate from successive iterations through resampling. Is this resampling done with replacement as well? Thanks in advance for your comments/views. Regards, Aditya.

ajosh · ‎05-29-2014

Hi Miguel, Thanks for your reply. I would like to state that I am using cost matrix for my imbalanced classification problem. And I see that the decision_target is populated with Y or N. Same is the case with from and into nodes as well as the actual target flag. The query you have sent in the last sections of your reply, seem to construct the misclassification matrix using from and into nodes but not decision and actual target flag. Do let me know if this is true.

ajosh · ‎05-29-2014

Hi All, I would like someone to please share the interpretation of the 3 columns which get generated as a part of exported dataset of any classification modeling node: from column, into column and decision column. Eg: if my target variable is binary with label as tgt, then what does from_tgt, into_tgt and decision_tgt imply. Also, what is the correct way of constructing the confusion matrix from the below 2 options: a. Cross tab between from_tgt and into_tgt columns. b. Cross tab between actual tgt column and decision_tgt column. These two approaches yield me different matrices. The first one has low precision and high recall and the other one has exact opposite. Also it seems that the minimum predicted probability for tgt = Y using from and into = Y is above 0.5. Any help is highly appreciated. Thanks.

ajosh · ‎05-27-2014

Hi All, Could someone please provide me with pointers on how to construct the misclassification matrix and how to choose the optimal cut off probability other than 0.5 in SAS Enterprise Miner, for the detailed explained in my earlier reply?

ajosh · ‎05-20-2014

Hi Miguel, I would like to reiterate few aspects related to this analysis (which you also have pointed out in your reply above): 1) Using profit/cost matrix is one of the solutions/strategies to address the class imbalance problem (in my case Y:N = 2%:98%). 2) Even if we use the profit/cost matrix, this doesnt override the default probabilities for forming decisions(predictions) Y or N. Hence I have used a cut off node after the model comparison (trees v/s logistic regression). 3) I would like to state that I had used the following weights (with maximize option) in decision properties: 5, -100, -10, 1 for TP, FN, FP and TN respectively. The issue I see here is that FN have been given 20 times more weightage than TP. Hence is there a specific logic by which these weight assignments can be done correctly. I would like to specify that this analysis is about identification of fraudulent persons which are already low in number than the overall non fraudulent ones. Hence I had given highest penalty (-ve profit) for those people who commit fraud but are tagged as non fraudulent by the model. Do suggest me if there is an alternative scheme of correctly specifying the profit/loss matrix in SAS E-Miner. As a follow up reply, I forgot to mention that one of the methods which tells us how to specify the optimal cut off (which could be sub 0.5) is as follows: 1) We first calculate the total profit at each cut off value (from 0.01 to 0.99) as: weight for TP*TP count + weight for FN*FN count + weight for FP*FP count + weight of TN*TN count (note in my case weights for FN and FP are negative, and should be so considered while deriving the total profit). 2) Then we calculate the average profit at each cut off node as follows: Total Profit/Predicted Positives (where predicted positives are sum of TP and FP). 3) We pick that cut off value which gives us the highest average profit in validation dataset and is almost similar to the average profit in training dataset as well. Do let me know if you could help me with assigning of proper weights to each outcomes as well as if the above formula needs to be modified for calculation for average profit. I think in general for fraud detection we need to have high TP and low FP and low FN isnt it? Thanks, Aditya.

ajosh · ‎05-20-2014

Hi Miguel, As a follow up reply, I forgot to mention that one of the methods which tells us how to specify the optimal cut off (which could be sub 0.5) is as follows: 1) We first calculate the total profit at each cut off value (from 0.01 to 0.99) as: weight for TP*TP count + weight for FN*FN count + weight for FP*FP count + weight of TN*TN count (note in my case weights for FN and FP are negative, and should be so considered while deriving the total profit). 2) Then we calculate the average profit at each cut off node as follows: Total Profit/Predicted Positives (where predicted positives are sum of TP and FP). 3) We pick that cut off value which gives us the highest average profit in validation dataset and is almost similar to the average profit in training dataset as well. Do let me know if you could help me with assigning of proper weights to each outcomes as well as if the above formula needs to be modified for calculation for average profit. I think in general for fraud detection we need to have high TP and low FP and low FN isnt it? Thanks, Aditya.

ajosh · ‎05-20-2014

Hi Miguel, I would like to reiterate few aspects related to this analysis (which you also have pointed out in your reply above): 1) Using profit/cost matrix is one of the solutions/strategies to address the class imbalance problem (in my case Y:N = 2%:98%). 2) Even if we use the profit/cost matrix, this doesnt override the default probabilities for forming decisions(predictions) Y or N. Hence I have used a cut off node after the model comparison (trees v/s logistic regression). 3) I would like to state that I had used the following weights (with maximize option) in decision properties: 5, -100, -10, 1 for TP, FN, FP and TN respectively. The issue I see here is that FN have been given 20 times more weightage than TP. Hence is there a specific logic by which these weight assignments can be done correctly. I would like to specify that this analysis is about identification of fraudulent persons which are already low in number than the overall non fraudulent ones. Hence I had given highest penalty (-ve profit) for those people who commit fraud but are tagged as non fraudulent by the model. Do suggest me if there is an alternative scheme of correctly specifying the profit/loss matrix in SAS E-Miner. Thanks a lot for your comments in this regard. Aditya.

ajosh · ‎05-16-2014

Hi All, This is in continuation with my earlier post on whether the original priors (which is not 50:50, but 2:98 for Y and N) have to be used when I am modeling on a balanced dataset in SAS EM. Thanks for your response. I could see that the original priors will have to be used to remove the "bias" introduced due to oversampling the minority instances. Later on, I can use these results on the same dataset to check how many instances who were originally tagged as "Y" have been correctly classified. Some questions that I am not able to understand in this analysis are: 1) There is another strategy (to handle class imbalance, called cost sensitive learning) of using the profit/loss matrix in the decision processing option for the input dataset. I specify the profit/loss/weights for each decision of TP, TN, FP, FN in this decision processing node. No oversampling is used. Decision trees are supposed to consider that decision which yields the max profit by considering profit/weights as well as proportions of Y and N at each node. But it seems that here again SAS EM has chosen 0.5 as the decision threshold, because of which only 75 out of 1840 minority classes have been chosen as TP. 2) In the exported dataset of the above decision tree model, I did a comparison of certain flags (From_Target_Flag, Into_Target_Flag) and (Actual Target Flag and Decision_Target_Flag) and found the following results: a) The above mentioned count of TP = 75 occurs for only 2 nodes of the decision tree where the following condition is satisfied: From_Target_Flag = Y and Into_Target_Flag = Y. These nodes have predicted probability > 0.5. (one has 0.53 and other has 0.9). b) The count of TP (could be proxy) = 1271 occurs for around 7 nodes of the decision tree where the following condition is satisfied: Actual Target Flag = Y and Decision_Target_Flag = Y. These nodes have predicted probability which varies from 0.14 to 0.9. So to summarize, which is the correct approach for finding the count of TP from question 2) and if necessary, how to implement the cut off node for selecting a smaller decision threshold from question 1). Also attached is the table from cut off node for default value of 0.5 used post running the decision tree. Thanks and Regards, Aditya. Cut Off Cumulative Expected Profit Count of TP Count of FP Count of TN Count of FN Count of Predicted Positives Count of Predicted Negatives Count of FP and FN Count of TP and TN Overall Classification Rate Change Count TP Change Count FP TP Rate TN Rate FP Rate Event Precision Rate Non Event Precision Rate Overall Precision Rate DataRole 0.99 -4.92954348 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.99 -7.896983998 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.98 -9.859086961 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.98 -15.793968 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.97 -14.78863044 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.97 -23.69095199 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.96 -19.71817392 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.96 -31.58793599 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.95 -24.6477174 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.95 -39.48491999 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.94 -29.57726088 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.94 -47.38190399 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.93 -34.50680436 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.93 -55.27888799 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.92 -39.43634784 0 0 63042 1286 0 64328 1286 63042 98.000871 0 0 0 100 0 NaN 98.00087054 NaN TRAIN 0.92 -63.17587199 0 0 27020 552 0 27572 552 27020 97.997969 0 0 0 100 0 NaN 97.99796895 NaN VALIDATE 0.91 -44.36589132 10 2 63040 1276 12 64316 1278 63050 98.013307 10 2 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.91 -71.07285598 5 2 27018 547 7 27565 549 27023 98.00885 5 2 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.9 -49.2954348 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.9 -78.96983998 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.89 -54.22497828 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.89 -86.86682398 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.88 -59.15452176 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.88 -94.76380798 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.87 -64.08406524 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.87 -102.660792 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.86 -69.01360873 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.86 -110.557776 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.85 -73.94315221 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.85 -118.45476 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.84 -78.87269569 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.84 -126.351744 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.83 -83.80223917 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.83 -134.248728 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.82 -88.73178265 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.82 -142.145712 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.81 -93.66132613 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.81 -150.042696 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.8 -98.59086961 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.8 -157.93968 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.79 -103.5204131 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.79 -165.836664 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.78 -108.4499566 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.78 -173.733648 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.77 -113.3795 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.77 -181.630632 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.76 -118.3090435 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.76 -189.527616 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.75 -123.238587 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.75 -197.4246 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.74 -128.1681305 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.74 -205.321584 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.73 -133.097674 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.73 -213.218568 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.72 -138.0272175 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.72 -221.115552 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.71 -142.9567609 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.71 -229.0125359 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.7 -147.8863044 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.7 -236.9095199 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.69 -152.8158479 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.69 -244.8065039 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.68 -157.7453914 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.68 -252.7034879 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.67 -162.6749349 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.67 -260.6004719 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.66 -167.6044783 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.66 -268.4974559 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.65 -172.5340218 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.65 -276.3944399 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.64 -177.4635653 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.64 -284.2914239 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.63 -182.3931088 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.63 -292.1884079 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.62 -187.3226523 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.62 -300.0853919 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.61 -192.2521957 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.61 -307.9823759 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.6 -197.1817392 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.6 -315.8793599 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.59 -202.1112827 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.59 -323.7763439 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.58 -207.0408262 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.58 -331.6733279 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.57 -211.9703697 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.57 -339.5703119 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.56 -216.8999131 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.56 -347.4672959 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.55 -221.8294566 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.55 -355.3642799 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.54 -226.7590001 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.54 -363.2612639 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.53 -231.6885436 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.53 -371.1582479 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.52 -236.6180871 10 2 63040 1276 12 64316 1278 63050 98.013307 0 0 0.777605 99.996828 0.00317249 83.33333333 98.01604577 90.67468955 TRAIN 0.52 -379.0552319 5 2 27018 547 7 27565 549 27023 98.00885 0 0 0.9057971 99.992598 0.00740192 71.42857143 98.01559949 84.72208546 VALIDATE 0.51 -241.5476305 55 89 62953 1231 144 64184 1320 63008 97.948016 45 87 4.2768274 99.858824 0.14117572 38.19444444 98.08207653 68.13826049 TRAIN 0.51 -386.9522159 20 46 26974 532 66 27506 578 26994 97.90367 15 44 3.6231884 99.829756 0.17024426 30.3030303 98.06587654 64.18445342 VALIDATE 0.5 -246.477174 55 89 62953 1231 144 64184 1320 63008 97.948016 0 0 4.2768274 99.858824 0.14117572 38.19444444 98.08207653 68.13826049 TRAIN 0.5 -394.8491999 20 46 26974 532 66 27506 578 26994 97.90367 0 0 3.6231884 99.829756 0.17024426 30.3030303 98.06587654 64.18445342 VALIDATE 0.49 -251.4067175 55 89 62953 1231 144 64184 1320 63008 97.948016 0 0 4.2768274 99.858824 0.14117572 38.19444444 98.08207653 68.13826049 TRAIN 0.49 -402.7461839 20 46 26974 532 66 27506 578 26994 97.90367 0 0 3.6231884 99.829756 0.17024426 30.3030303 98.06587654 64.18445342 VALIDATE 0.48 -256.336261 55 89 62953 1231 144 64184 1320 63008 97.948016 0 0 4.2768274 99.858824 0.14117572 38.19444444 98.08207653 68.13826049 TRAIN 0.48 -410.6431679 20 46 26974 532 66 27506 578 26994 97.90367 0 0 3.6231884 99.829756 0.17024426 30.3030303 98.06587654 64.18445342 VALIDATE 0.47 -261.2658045 515 1110 61932 771 1625 62703 1881 62447 97.075923 460 1021 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.47 -418.5401519 219 464 26556 333 683 26889 797 26775 97.109386 199 418 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.46 -266.1953479 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.46 -426.4371359 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.45 -271.1248914 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.45 -434.3341199 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.44 -276.0544349 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.44 -442.2311039 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.43 -280.9839784 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.43 -450.1280879 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.42 -285.9135219 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.42 -458.0250719 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.41 -290.8430653 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.41 -465.9220559 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.4 -295.7726088 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.4 -473.8190399 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.39 -300.7021523 515 1110 61932 771 1625 62703 1881 62447 97.075923 0 0 40.046656 98.239269 1.76073094 31.69230769 98.77039376 65.23135073 TRAIN 0.39 -481.7160239 219 464 26556 333 683 26889 797 26775 97.109386 0 0 39.673913 98.282754 1.71724648 32.06442167 98.76157537 65.41299852 VALIDATE 0.38 -305.6316958 534 1172 61870 752 1706 62622 1924 62404 97.009078 19 62 41.524106 98.140922 1.85907807 31.30128957 98.79914407 65.05021682 TRAIN 0.38 -489.6130079 228 479 26541 324 707 26865 803 26769 97.087625 9 15 41.304348 98.227239 1.77276092 32.24893918 98.79396985 65.52145451 VALIDATE 0.37 -310.5612393 534 1172 61870 752 1706 62622 1924 62404 97.009078 0 0 41.524106 98.140922 1.85907807 31.30128957 98.79914407 65.05021682 TRAIN 0.37 -497.5099919 228 479 26541 324 707 26865 803 26769 97.087625 0 0 41.304348 98.227239 1.77276092 32.24893918 98.79396985 65.52145451 VALIDATE 0.36 -315.4907827 534 1172 61870 752 1706 62622 1924 62404 97.009078 0 0 41.524106 98.140922 1.85907807 31.30128957 98.79914407 65.05021682 TRAIN 0.36 -505.4069759 228 479 26541 324 707 26865 803 26769 97.087625 0 0 41.304348 98.227239 1.77276092 32.24893918 98.79396985 65.52145451 VALIDATE 0.35 -320.4203262 534 1172 61870 752 1706 62622 1924 62404 97.009078 0 0 41.524106 98.140922 1.85907807 31.30128957 98.79914407 65.05021682 TRAIN 0.35 -513.3039599 228 479 26541 324 707 26865 803 26769 97.087625 0 0 41.304348 98.227239 1.77276092 32.24893918 98.79396985 65.52145451 VALIDATE 0.34 -325.3498697 534 1172 61870 752 1706 62622 1924 62404 97.009078 0 0 41.524106 98.140922 1.85907807 31.30128957 98.79914407 65.05021682 TRAIN 0.34 -521.2009439 228 479 26541 324 707 26865 803 26769 97.087625 0 0 41.304348 98.227239 1.77276092 32.24893918 98.79396985 65.52145451 VALIDATE 0.33 -330.2794132 546 1220 61822 740 1766 62562 1960 62368 96.953115 12 48 42.457232 98.064782 1.93521779 30.91732729 98.81717336 64.86725033 TRAIN 0.33 -529.0979279 233 509 26511 319 742 26830 828 26744 96.996953 5 30 42.210145 98.11621 1.88378979 31.40161725 98.81103243 65.10632484 VALIDATE 0.32 -335.2089567 595 1424 61618 691 2019 62309 2115 62213 96.712163 49 204 46.267496 97.741188 2.25881159 29.47003467 98.89101093 64.1805228 TRAIN 0.32 -536.9949119 255 621 26399 297 876 26696 918 26654 96.670535 22 112 46.195652 97.701702 2.29829756 29.10958904 98.88747378 63.99853141 VALIDATE 0.31 -340.1385001 595 1424 61618 691 2019 62309 2115 62213 96.712163 0 0 46.267496 97.741188 2.25881159 29.47003467 98.89101093 64.1805228 TRAIN 0.31 -544.8918959 255 621 26399 297 876 26696 918 26654 96.670535 0 0 46.195652 97.701702 2.29829756 29.10958904 98.88747378 63.99853141 VALIDATE 0.3 -345.0680436 595 1424 61618 691 2019 62309 2115 62213 96.712163 0 0 46.267496 97.741188 2.25881159 29.47003467 98.89101093 64.1805228 TRAIN 0.3 -552.7888799 255 621 26399 297 876 26696 918 26654 96.670535 0 0 46.195652 97.701702 2.29829756 29.10958904 98.88747378 63.99853141 VALIDATE 0.29 -349.9975871 595 1424 61618 691 2019 62309 2115 62213 96.712163 0 0 46.267496 97.741188 2.25881159 29.47003467 98.89101093 64.1805228 TRAIN 0.29 -560.6858639 255 621 26399 297 876 26696 918 26654 96.670535 0 0 46.195652 97.701702 2.29829756 29.10958904 98.88747378 63.99853141 VALIDATE 0.28 -354.9271306 595 1424 61618 691 2019 62309 2115 62213 96.712163 0 0 46.267496 97.741188 2.25881159 29.47003467 98.89101093 64.1805228 TRAIN 0.28 -568.5828479 255 621 26399 297 876 26696 918 26654 96.670535 0 0 46.195652 97.701702 2.29829756 29.10958904 98.88747378 63.99853141 VALIDATE 0.27 -359.8566741 595 1424 61618 691 2019 62309 2115 62213 96.712163 0 0 46.267496 97.741188 2.25881159 29.47003467 98.89101093 64.1805228 TRAIN 0.27 -576.4798319 255 621 26399 297 876 26696 918 26654 96.670535 0 0 46.195652 97.701702 2.29829756 29.10958904 98.88747378 63.99853141 VALIDATE 0.26 -364.7862175 625 1597 61445 661 2222 62106 2258 62070 96.489864 30 173 48.600311 97.466768 2.53323181 28.12781278 98.93569059 63.53175169 TRAIN 0.26 -584.3768159 265 695 26325 287 960 26612 982 26590 96.438416 10 74 48.007246 97.427831 2.57216876 27.60416667 98.92153916 63.26285291 VALIDATE 0.25 -369.715761 683 1937 61105 603 2620 61708 2540 61788 96.051486 58 340 53.11042 96.927445 3.0725548 26.06870229 99.02281714 62.54575971 TRAIN 0.25 -592.2737999 289 829 26191 263 1118 26454 1092 26480 96.03946 24 134 52.355072 96.931902 3.06809771 25.84973166 99.00582143 62.42777654 VALIDATE 0.24 -374.6453045 683 1937 61105 603 2620 61708 2540 61788 96.051486 0 0 53.11042 96.927445 3.0725548 26.06870229 99.02281714 62.54575971 TRAIN 0.24 -600.1707839 289 829 26191 263 1118 26454 1092 26480 96.03946 0 0 52.355072 96.931902 3.06809771 25.84973166 99.00582143 62.42777654 VALIDATE 0.23 -379.574848 683 1937 61105 603 2620 61708 2540 61788 96.051486 0 0 53.11042 96.927445 3.0725548 26.06870229 99.02281714 62.54575971 TRAIN 0.23 -608.0677679 289 829 26191 263 1118 26454 1092 26480 96.03946 0 0 52.355072 96.931902 3.06809771 25.84973166 99.00582143 62.42777654 VALIDATE 0.22 -384.5043915 683 1937 61105 603 2620 61708 2540 61788 96.051486 0 0 53.11042 96.927445 3.0725548 26.06870229 99.02281714 62.54575971 TRAIN 0.22 -615.9647519 289 829 26191 263 1118 26454 1092 26480 96.03946 0 0 52.355072 96.931902 3.06809771 25.84973166 99.00582143 62.42777654 VALIDATE 0.21 -389.4339349 685 1952 61090 601 2637 61691 2553 61775 96.031277 2 15 53.265941 96.903652 3.09634847 25.97648843 99.02578982 62.50113913 TRAIN 0.21 -623.8617359 290 835 26185 262 1125 26447 1097 26475 96.021326 1 6 52.536232 96.909697 3.09030348 25.77777778 99.00933943 62.39355861 VALIDATE 0.2 -394.3634784 685 1952 61090 601 2637 61691 2553 61775 96.031277 0 0 53.265941 96.903652 3.09634847 25.97648843 99.02578982 62.50113913 TRAIN 0.2 -631.7587199 290 835 26185 262 1125 26447 1097 26475 96.021326 0 0 52.536232 96.909697 3.09030348 25.77777778 99.00933943 62.39355861 VALIDATE 0.19 -399.2930219 693 2020 61022 593 2713 61615 2613 61715 95.938005 8 68 53.888025 96.795787 3.20421306 25.54367858 99.03757202 62.2906253 TRAIN 0.19 -639.6557039 293 865 26155 259 1158 26414 1124 26448 95.923401 3 30 53.07971 96.798668 3.20133235 25.30224525 99.01945938 62.16085231 VALIDATE 0.18 -404.2225654 817 3140 59902 469 3957 60371 3609 60719 94.38969 124 1120 63.530327 95.019194 4.98080645 20.64695476 99.22313694 59.93504585 TRAIN 0.18 -647.5526879 346 1311 25709 206 1657 25915 1517 26055 94.498041 53 446 62.681159 95.148038 4.85196151 20.88111044 99.20509358 60.04310201 VALIDATE 0.17 -409.1521089 817 3140 59902 469 3957 60371 3609 60719 94.38969 0 0 63.530327 95.019194 4.98080645 20.64695476 99.22313694 59.93504585 TRAIN 0.17 -655.4496719 346 1311 25709 206 1657 25915 1517 26055 94.498041 0 0 62.681159 95.148038 4.85196151 20.88111044 99.20509358 60.04310201 VALIDATE 0.16 -414.0816524 817 3140 59902 469 3957 60371 3609 60719 94.38969 0 0 63.530327 95.019194 4.98080645 20.64695476 99.22313694 59.93504585 TRAIN 0.16 -663.3466559 346 1311 25709 206 1657 25915 1517 26055 94.498041 0 0 62.681159 95.148038 4.85196151 20.88111044 99.20509358 60.04310201 VALIDATE 0.15 -419.0111958 821 3183 59859 465 4004 60324 3648 60680 94.329064 4 43 63.841369 94.950985 5.04901494 20.5044955 99.22916252 59.86682901 TRAIN 0.15 -671.2436398 347 1325 25695 205 1672 25900 1530 26042 94.450892 1 14 62.862319 95.096225 4.90377498 20.75358852 99.20849421 59.98104136 VALIDATE 0.14 -423.9407393 872 3820 59222 414 4692 59636 4234 60094 93.418107 51 637 67.807154 93.940548 6.05945243 18.58482523 99.30578845 58.94530684 TRAIN 0.14 -679.1406238 375 1597 25423 177 1972 25600 1774 25798 93.565936 28 272 67.934783 94.089563 5.91043671 19.01622718 99.30859375 59.16241047 VALIDATE 0.13 -428.8702828 872 3820 59222 414 4692 59636 4234 60094 93.418107 0 0 67.807154 93.940548 6.05945243 18.58482523 99.30578845 58.94530684 TRAIN 0.13 -687.0376078 375 1597 25423 177 1972 25600 1774 25798 93.565936 0 0 67.934783 94.089563 5.91043671 19.01622718 99.30859375 59.16241047 VALIDATE 0.12 -433.7998263 872 3820 59222 414 4692 59636 4234 60094 93.418107 0 0 67.807154 93.940548 6.05945243 18.58482523 99.30578845 58.94530684 TRAIN 0.12 -694.9345918 375 1597 25423 177 1972 25600 1774 25798 93.565936 0 0 67.934783 94.089563 5.91043671 19.01622718 99.30859375 59.16241047 VALIDATE 0.11 -438.7293698 888 4074 58968 398 4962 59366 4472 59856 93.048128 16 254 69.051322 93.537642 6.46235843 17.89600967 99.32958259 58.61279613 TRAIN 0.11 -702.8315758 383 1720 25300 169 2103 25469 1889 25683 93.148847 8 123 69.384058 93.634345 6.36565507 18.21207798 99.33644823 58.77426311 VALIDATE 0.1 -443.6589132 888 4074 58968 398 4962 59366 4472 59856 93.048128 0 0 69.051322 93.537642 6.46235843 17.89600967 99.32958259 58.61279613 TRAIN 0.1 -710.7285598 383 1720 25300 169 2103 25469 1889 25683 93.148847 0 0 69.384058 93.634345 6.36565507 18.21207798 99.33644823 58.77426311 VALIDATE 0.09 -448.5884567 888 4074 58968 398 4962 59366 4472 59856 93.048128 0 0 69.051322 93.537642 6.46235843 17.89600967 99.32958259 58.61279613 TRAIN 0.09 -718.6255438 383 1720 25300 169 2103 25469 1889 25683 93.148847 0 0 69.384058 93.634345 6.36565507 18.21207798 99.33644823 58.77426311 VALIDATE 0.08 -453.5180002 897 4280 58762 389 5177 59151 4669 59659 92.741885 9 206 69.751166 93.210875 6.78912471 17.32663705 99.34236108 58.33449906 TRAIN 0.08 -726.5225278 386 1799 25221 166 2185 25387 1965 25607 92.873205 3 79 69.927536 93.341969 6.65803109 17.66590389 99.34612203 58.50601296 VALIDATE 0.07 -458.4475437 897 4280 58762 389 5177 59151 4669 59659 92.741885 0 0 69.751166 93.210875 6.78912471 17.32663705 99.34236108 58.33449906 TRAIN 0.07 -734.4195118 386 1799 25221 166 2185 25387 1965 25607 92.873205 0 0 69.927536 93.341969 6.65803109 17.66590389 99.34612203 58.50601296 VALIDATE 0.06 -463.3770872 925 5113 57929 361 6038 58290 5474 58854 91.490486 28 833 71.92846 91.889534 8.11046604 15.31964227 99.38068279 57.35016253 TRAIN 0.06 -742.3164958 397 2163 24857 155 2560 25012 2318 25254 91.59292 11 364 71.92029 91.994819 8.00518135 15.5078125 99.38029746 57.44405498 VALIDATE 0.05 -468.3066306 1102 11308 51734 184 12410 51918 11492 52836 82.135307 177 6195 85.692068 82.062752 17.9372482 8.879935536 99.64559498 54.26276526 TRAIN 0.05 -750.2134798 467 4841 22179 85 5308 22264 4926 22646 82.134049 70 2678 84.601449 82.083642 17.9163583 8.798040693 99.61821775 54.20812922 VALIDATE 0.04 -473.2361741 1104 11394 51648 182 12498 51830 11576 52752 82.004726 2 86 85.847589 81.926335 18.0736652 8.833413346 99.64885202 54.24113268 TRAIN 0.04 -758.1104638 467 4882 22138 85 5349 22223 4967 22605 81.985347 0 41 84.601449 81.931902 18.0680977 8.730603851 99.61751339 54.17405862 VALIDATE 0.03 -478.1657176 1156 14391 48651 130 15547 48781 14521 49807 77.426626 52 2997 89.891135 77.172361 22.8276387 7.435518106 99.7335028 53.58451045 TRAIN 0.03 -766.0074478 490 6135 20885 62 6625 20947 6197 21375 77.5243 23 1253 88.768116 77.294597 22.7054034 7.396226415 99.70401489 53.55012065 VALIDATE 0.02 -483.0952611 1194 17693 45349 92 18887 45441 17785 46543 72.35263 38 3302 92.846034 71.934583 28.0654167 6.321808651 99.79753967 53.05967416 TRAIN 0.02 -773.9044318 511 7539 19481 41 8050 19522 7580 19992 72.508342 21 1404 92.572464 72.098446 27.9015544 6.347826087 99.78998053 53.06890331 VALIDATE 0.01 -488.0248046 1204 18975 44067 82 20179 44149 19057 45271 70.375264 10 1282 93.623639 69.901018 30.0989816 5.966598939 99.81426533 52.89043213 TRAIN 0.01 -781.8014158 514 8061 18959 38 8575 18997 8099 19473 70.625997 3 522 93.115942 70.166543 29.8334567 5.994169096 99.79996842 52.89706876 VALIDATE 0 -492.954348 1286 63042 0 0 64328 0 63042 1286 1.9991295 82 44067 100 0 100 1.999129462 NaN NaN TRAIN 0 -789.6983998 552 27020 0 0 27572 0 27020 552 2.002031 38 18959 100 0 100 2.002031046 NaN NaN VALIDATE

ajosh · ‎05-09-2014

Hi Miguel, Thanks for your quick response. Yes, you guessed it right, I am working on a binary classification problem with imbalanced proportions of Y and N. Now, the issue of class imbalance is not solely responsible for poor classification performance of the model/s. This coupled with an overlap among the target classes (or rare event instances occuring in smaller disjuncts/islands) further complicates the rare event classification problem. SMOTE+Tomek Links is just one of the handful techniques aimed at achieving a "balanced" dataset, on which traditional classifiers work well. I would rephrase my original question as: If I create a balanced dataset with Y:N almost the same, then should I still use the adjusted priors (decision processing settings for input data in SAS EM), before running any model. Later on, I still would use my original dataset for scoring, just to check how many instances fall under TP, TN etc. I think this should be correct as I have seen few examples (PVK'97 Donor dataset or similar) where they have started with a balanced dataset, but then used the adjusted priors same as the original priors, before running the decision tree models etc. I shall go through the link shared by you in detail once again, as I see it much useful. Earlier, I tried using boosting and gradient boosting on a similar dataset with the following results: 1) Embed decision tree node between start and end group node, select boosting with 5 iterations and run the process flow: SAS EM gives good TP, but there are large number of FP also (needless to say interpreting these results as patterns was daunting task which I somehow managed). However, higher iterations of boosting lead to diminished performance as well. 2) Gradient Boosting: SAS EM Gradient Boosting didnt yield me any results (not really sure as to why?). I assume gradient boosting works for binary targets as well. Do share your thoughts/experiences on the same. Hope this information is useful to you. Regards, Aditya.

ajosh · ‎05-08-2014

Hi All, I am working on a class imbalance problem where Y:N ratio is approx 1:49 with high overlap amongst the two classes (binary classification problem). There are roughly 1850 Y records (tgt = Y, of interest) while 90,000 are tgt = N instances. One of the strategies is to approach this class imbalance problem is using SMOTE and say Tomek Link (through R or similar software) to achieve a balanced dataset of Y:N almost 50:50. Post balancing, the data manipulations like transformations, binning etc have been implemented and 2 modeling techniques have been used to find the results. The original dataset is once again used with role as "Score" to find out how many instances before data balancing have been correctly classified. In such situations, is it necessary to used the "Adjusted Priors" in the decision processing matrix of the balanced input dataset (where I enter the original priors and use them) and then do the scoring? Or I can do the scoring straight away without using adjusted priors. In the first scenario (adjusted priors = original priors) the count of true positives is around 150 (both predicted and actual = Y) whereas in the second scenario (no adjusted priors used) it is around 600. Would appreciate help in this regard from the community members ! Thanks, Aditya.

Online Status	Offline
Date Last Visited	‎09-01-2015 07:12 AM

Questions on exploratory factor analysis..

Identification of infrequent aka suspicious association rules.

Re: Is cut off node should still be used when boosting/ensemble models...

Re: Is cut off node should still be used when boosting/ensemble models...

Re: Is cut off node should still be used when boosting/ensemble models...

Is cut off node should still be used when boosting/ensemble models are...

Re: interpretation of from, into and decision columns in exported data...

interpretation of from, into and decision columns in exported data set...

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Some questions on data manipulation before using model node in SAS...

Questions on exploratory factor analysis..

Identification of infrequent aka suspicious association rules.

Re: Is cut off node should still be used when boosting/ensemble models...

Re: Is cut off node should still be used when boosting/ensemble models...

Re: Is cut off node should still be used when boosting/ensemble models...

Is cut off node should still be used when boosting/ensemble models are...

Re: interpretation of from, into and decision columns in exported data...

interpretation of from, into and decision columns in exported data set...

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Using Cut Off Node and Interpreting Predicted Probabilities.

Using Cut Off Node and Interpreting Predicted Probabilities.

Re: Regarding use of original prior probabilities in class imbalance p...

Regarding use of original prior probabilities in class imbalance probl...