Re: SAS EMiner Machine Learning Models - Stability Check

VMadhav · Posted 03-22-2017 07:32 AM

Hi, I have built a Gradient Boosting Model using SAS EMiner 13.1 on data with 1.8% event rate { target is binary variable }.

Model results are good and I wanted to test the model on Out of time.

Hence applied the GB scoring code on data set generated on a different time frame. After running scoring code, I wanted to check rank ordering if it still holds good.{ not sure if this is expected on machine learning models, It's done on traditional logit models for stability}

data temp1;
set Scored_gb_dataset ;

proc sort data=temp1; by descending EM_EVENTPROBABILITY; run;
data temp2 (drop = i count);
set temp1 nobs = size;

count + 1;
do i = 1 to 10;
if (i-1) * (size/10) < count <= i * (size/10) then decile = i;
end;
run;

proc freq data = temp2 formchar = ' ';
tables decile * actual_target /nocum norow nocol nopercent;
run;

I sorted the data based on EM_EVENT PROBABILITY and created deciles based on number of observations and I have checked # of actual responders by decile to see rank ordering and it breaks on 4th decile. However, it does capture ~75% events on top 3 deciles.

Usually for classification models like decision tree's, they would be classified as High/Medium/Low risk segments and events captured by these H/M/L segments could indicate validity on out of time validation. But here probability is assigned for each observation or ID I think.

Should we expect rank ordering to hold good on out of time samples for machine learning classification models? Appreciate your help/thoughts on the same

DougWielenga · Posted 08-07-2017 03:06 PM

Should we expect rank ordering to hold good on out of time samples for machine learning classification models?

If the model didn't perform reasonably well on out-of-time samples, it would not be a particularly useful model. The expectation, of course, is that model performance on out-of-time samples will not be as good as the data which was used to train it but it does provide a benchmark for using the model going forward. SAS Model Manager is designed to apply previously fit models to future data and evaluate the performance. Over time, the model performance is likely to degrade and require a refit. How quickly it degrades, though, is a function of many factors including how well the training data reflected the population at the time of modeling, how the population and/or external factors has changed, and how well the model actually fit. Should the modeler desire to refit the model, SAS Model Manager can perform that task as well in most situations. In general, performance the out-of-time sample provide the best evaluation about how useful a particular model is at that time.

Hope this helps!

Doug