New Contributor
Posts: 3

# How to model Y when a lot of 0 values come from small values taken back to 0?

Context: it is not uncommon to have an observed dependent variable which is, for some fraction of the sample, left-censored to 0, not (or not only) because the latent variable takes negative values but (also) because small values are taken back to 0. Examples include, at least in France, very low amounts of income tax which are not collected (if below 61 euros), as well as some social allowances without fixed amount but whose amount results from the difference between some threshold and an amount calculated as a function of income level, number of children, flat size,etc.(e.g. housing allowances lower than 24 euros/month, or minimum income RSA amounting to less than 6 euros per month); for these cases, the same legal non-collection thresholds do apply to everybody.

In my current research, I assume that French judges who are setting child support amounts might consider as ineffective a decision that would impose to the debtor some payment when the child support amount resulting from their calculations (it is our latent, unobserved, variable) is very low. Hence a non negligible proportion of observed null child support amounts as prescribed by judges might be explained by this process, which results in a higher concentration of child support amounts at 0.

As a consequence I am looking for a statistical model able to describe both the classical left-censoring phenomenon (e.g. a type I Tobit) resulting from negative values of the latent variable, and the process by which small latent amounts translate into null amounts.

Note that, in addition, I would be interested in modeling judge behavior not as driven by any legal uniform threshold (such a threshold does not exist) but by thresholds depending on some explaining variables, making that censoring threshold different from one judge to the other, as well as different along the very situation of the debtor and the creditor in each case.

I am aware that my problem is somehow related to the LOF (Limit of Detection) question, often dealt with in health or environment research but I did not found out relevant statistical models.

Working with SAS for decades, I have tried PROC QLIM to estimate a Type I Tobit and, alternativey, PROC GENMOD to estimate a ZINB (Zero-Inflated Negative Binomial); but, as I applied these models to fictitious data I built, both models did not allow to recover  correctly the parameters' values I used to generate this fake data. Moreover, from an interpretation point of view, these models assume that individuals take two successive decisions (e.g. - classical examples - should I buy a durable good or not, and then, if yes, which expenditure amount; second example: how many fish are being caught by fishermen at a state park but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish). By contrast, I am reluctant to assume that judges first decide to opt for a null or very low amount -resulting in an observed 0 - and, if not, set some positive amount, in a second step.

Would mixture models estimated through the experimental PROC FMM be a reasonable way to deal with the two processes I describe above? I am aware that "the FMM procedure provides a limited number of built-in distributions and link functions" and that "user-defined distributions or link functions are not supported". Then should I use the NLMIXED procedure?

Hence my question is: any idea or reference about a way to model (which theoretical model? Which SAS PROC?) this kind of assumed judge's behavior?

Jean.C. Ray

SAS Employee
Posts: 340

## Re: How to model Y when a lot of 0 values come from small values taken back to 0?

Hi, let me comment on the PROC side:
I think your "path" from genmod, qlim, fmm to nlmixed is correct. As your model is quite non-standard, I'm not wondering, that you didn't found a suitable one in other procs.
The "good news" is, that it is quite easy to use. Especially if you are simulating data with a data step, you can use almost the same code inside proc nlmixed to create and fit the model.
Of course using a plug-and-play procedure (if there is a suitable one) is preferable, because SAS can use smarter tailor made algorithms to fit the model ("smarter" initial values, analytical derivatives).
New Contributor
Posts: 3

## Re: How to model Y when a lot of 0 values come from small values taken back to 0?

Very useful pieces of advice, dear Gergely Batho. Thanks a lot!

Jean C.

Discussion stats
• 2 replies
• 213 views
• 1 like
• 2 in conversation