BookmarkSubscribeRSS Feed
VMadhav
Calcite | Level 5

 

Hi, I have built a Gradient Boosting Model using SAS EMiner 13.1 on data with 1.8% event rate { target is binary variable }.

Model results are good and I wanted to test the model on Out of time. 

 

Hence applied the GB scoring code on data set generated on a different time frame. After running scoring code, I wanted to check rank ordering if it still holds good.{ not sure if this is expected on machine learning models, It's done on traditional logit models for stability}

 

data temp1;
set Scored_gb_dataset ;

proc sort data=temp1; by descending EM_EVENTPROBABILITY; run;
data temp2 (drop = i count);
set temp1 nobs = size;

count + 1;
do i = 1 to 10;
  if (i-1) * (size/10) < count <= i * (size/10) then decile = i;
end;
run;

proc freq data = temp2 formchar = '           ';
tables decile * actual_target /nocum norow nocol nopercent;
run;

 

I sorted the data based on EM_EVENT PROBABILITY and created deciles based on number of observations and I have checked # of actual responders by decile to see rank ordering and it breaks on 4th decile. However, it does capture ~75% events on top 3 deciles.

 

Usually for classification models like decision tree's, they would be classified as High/Medium/Low risk segments and events captured  by these H/M/L segments could indicate validity on out of time validation. But here probability is assigned for each observation or ID I think.

 

Should we expect rank ordering to hold good on out of time samples for machine learning classification models? Appreciate your help/thoughts on the same

 

 

1 REPLY 1
DougWielenga
SAS Employee

Should we expect rank ordering to hold good on out of time samples for machine learning classification models? 


If the model didn't perform reasonably well on out-of-time samples, it would not be a particularly useful model.  The expectation, of course, is that model performance on out-of-time samples will not be as good as the data which was used to train it but it does provide a benchmark for using the model going forward.  SAS Model Manager is designed to apply previously fit models to future data and evaluate the performance.   Over time, the model performance is likely to degrade and require a refit.   How quickly it degrades, though, is a function of many factors including how well the training data reflected the population at the time of modeling, how the population and/or external factors has changed, and how well the model actually fit.  Should the modeler desire to refit the model, SAS Model Manager can perform that task as well in most situations.   In general, performance the out-of-time sample provide the best evaluation about how useful a particular model is at that time.  

 

Hope this helps!

Doug

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 885 views
  • 0 likes
  • 2 in conversation