Re: system stability report (what range of data used for Actual data)

cmajorros · Posted 10-31-2016 05:10 AM

Dear All I am currently running Systerm stability report for a credit scoring model. I have found. During modelling process, I use data from Jan2009 to Jul 2015 in building model. After checking windows performance the duration for performing bad behavior is 18 months. So I used data from Jan09 to Feb 14 in building the model. After I finish building the model, I have test the charater of portfolio by selecting the data from Jan09 to Jul15 test the stability of characters. All characters have proper index value (Less that 0.1). After that a year (July 16), I use the model after that 3 months I run system stabilty report and character report by using data from Jul16 to Sep 16 as Actual data compare with Expected data (Jun09 - Feb14). I have found that the characters of customer has been shift. From this step, I think I have got a problem about selecting data in stability reports. I need you guy help by answering these questions: 1) Expected Data is included only data which was used in the model (Jun09 - Feb14) ? 2) After building the model, data I used in testing character should be from (Mar14 to July15)? or should i include data in modelling (Jan 09 - Jul 15) in testing? 3) After applying new the model, which is the range of actual data of system statbility? A. Jan09 - Sep 16 B.Mar 14 - Sep 16 C. After launching new model (Jul16 - Sep 16) Someone said it should be C. I am quite sure that this C will result in shifting of model and causes of very high index value (> 0.1). Personally, I think the data in testing should be included data which was used in modelling. but I am not really sure about what i thought. I am a newbie modelling and really need your suggestion . Thanks for your suggestion in advance.

DougWielenga · Posted 08-09-2017 10:31 AM

The issue you are describing illustrates the problem known as 'temporal infidelity'. This problem occurs when the relationships modeled for the time periods you had available during modeling had shifted by the time the model was applied to new data. In general, models will not perform on newer data as well as they performed on your historical data. You need to monitor the amount of the change and the nature of the change to assess when a model needs to be refit. Using out-of-time samples to validate your model is a reasonable practice and gives you a more realistic assessment of how your model will perform, but do not be surprised when it does not perform as well. Simply including all of the training data will make some of your metrics look better but will be misleading as they also mask the temporal infidelity which you seem to have identified. Tools such as SAS Model Manager allow you to monitor the performance of a model over time so that you can refit the model when the performance has degraded too much.

I hope this helps!

Doug