You have to figure out a way to code them, otherwise that variable will be excluded from the analysis most likely. Generally, from my understanding, most algorithms will do that. Given that the duration of default is only for those who default including it also problematic since it's only present for those who do default? Or at least will be very highly correlated, ie an account that's 36 months delinquent is more likely to default than one that's not delinquent. This is a methodology question by the way, not specifically a coding question. I still think coding these as 0 is a good idea and then coding everything else as 1 to 80 makes sense. But hopefully someone else has a better answer for you.
... View more