Contributor
Posts: 20

# How long of the data for Behavioral Modelling

Dear All I am building the Behavioral Scoring Model for a financial product. Basically, data is seperated into 2 parts (observation period and outcome period). For this project, I want to predict customer who have overdue amount equal to 1 installment will be come worse (move to overdue amount 2 installments) within the next 3 months. Are there any theories or practices to support my these questions ? How many months or years of data should be used in modelling? (12 months or 5 years or 7 years) Is it related to my product life cycle? If customers mostly have the contract 72 months (6 years) or 20 years like mortgage loan, the collected data in model should be 6 years or 20 years ? if yes, it seems like data for mortgage loan take so long time for collecting. It may result in changing of characteristics of customer. Or is collecting period should be 7 years which is the duration of business life cycle? Or only 12 months are enough? Is there any practices or theories to support the decision? Another question is how long should the outcome period be ? Is there any calculation for supporting? Regards, Ros
Super User
Posts: 3,254

## Re: How long of the data for Behavioral Modelling

Well I don't think there are any hard and fast rules on this. It also depends how your data behaves and whether you want to take into account both good and bad economic conditions. For example if you only use data from 12 months, say 2015, when economic conditions are good, would your behaviour score model also work when times are bad, like 2006/2007 (GFC)?

If you want your model to work over a variety of economic conditions then you need to use observational data from those periods, so I'd say you need at least 5 years data and probably more for long-term mortgage products. What you may also find is that getting enough overdue events to build a highly predictive model may be hard when times are good but definitely not so hard in adverse conditions.

The outcome period should match what you are trying to predict. For example where I work we need to predict the probability of going into default (90 days past due) in the next 12 months. That means we need to look forward 13 months from the observation point and see if the loan went into default in any of those months.

Discussion stats