turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to model Y when a lot of 0 values come from sm...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-25-2016 12:33 PM

Context: it is not uncommon to have an observed dependent variable which is, for some fraction of the sample, left-censored to 0, not (or not only) because the latent variable takes negative values but (also) because small values are taken back to 0. Examples include, at least in France, very low amounts of income tax which are not collected (if below 61 euros), as well as some social allowances without fixed amount but whose amount results from the difference between some threshold and an amount calculated as a function of income level, number of children, flat size,etc.(e.g. housing allowances lower than 24 euros/month, or minimum income RSA amounting to less than 6 euros per month); for these cases, the same legal non-collection thresholds do apply to everybody.

In my current research, I assume that French judges who are setting child support amounts might consider as ineffective a decision that would impose to the debtor some payment when the child support amount resulting from their calculations (it is our latent, unobserved, variable) is very low. Hence a non negligible proportion of observed null child support amounts as prescribed by judges might be explained by this process, which results in a higher concentration of child support amounts at 0.

As a consequence I am looking for a statistical model able to describe both the classical left-censoring phenomenon (e.g. a type I Tobit) resulting from negative values of the latent variable, and the process by which small latent amounts translate into null amounts.

Note that, in addition, I would be interested in modeling judge behavior not as driven by any legal uniform threshold (such a threshold does not exist) but by thresholds depending on some explaining variables, making that censoring threshold different from one judge to the other, as well as different along the very situation of the debtor and the creditor in each case.

I am aware that my problem is somehow related to the LOF (Limit of Detection) question, often dealt with in health or environment research but I did not found out relevant statistical models.

Working with SAS for decades, I have tried PROC QLIM to estimate a Type I Tobit and, alternativey, PROC GENMOD to estimate a ZINB (Zero-Inflated Negative Binomial); but, as I applied these models to fictitious data I built, both models did not allow to recover correctly the parameters' values I used to generate this fake data. Moreover, from an interpretation point of view, these models assume that individuals take two successive decisions (e.g. - classical examples - should I buy a durable good or not, and then, if yes, which expenditure amount; second example: how many fish are being caught by fishermen at a state park but there is no data on whether a person fished or not. Some visitors who did fish did not catch any fish so there are excess zeros in the data because of the people that did not fish). By contrast, I am reluctant to assume that judges first decide to opt for a null or very low amount -resulting in an observed 0 - and, if not, set some positive amount, in a second step.

Would mixture models estimated through the experimental PROC FMM be a reasonable way to deal with the two processes I describe above? I am aware that "the FMM procedure provides a limited number of built-in distributions and link functions" and that "user-defined distributions or link functions are not supported". Then should I use the NLMIXED procedure?

Hence my question is: any idea or reference about a way to model (which theoretical model? Which SAS PROC?) this kind of assumed judge's behavior?

Many thanks in advance for any piece of advice.

Jean.C. Ray

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-28-2016 09:10 AM

Hi, let me comment on the PROC side:

I think your "path" from genmod, qlim, fmm to nlmixed is correct. As your model is quite non-standard, I'm not wondering, that you didn't found a suitable one in other procs.

The "good news" is, that it is quite easy to use. Especially if you are simulating data with a data step, you can use almost the same code inside proc nlmixed to create and fit the model.

Of course using a plug-and-play procedure (if there is a suitable one) is preferable, because SAS can use smarter tailor made algorithms to fit the model ("smarter" initial values, analytical derivatives).

I think your "path" from genmod, qlim, fmm to nlmixed is correct. As your model is quite non-standard, I'm not wondering, that you didn't found a suitable one in other procs.

The "good news" is, that it is quite easy to use. Especially if you are simulating data with a data step, you can use almost the same code inside proc nlmixed to create and fit the model.

Of course using a plug-and-play procedure (if there is a suitable one) is preferable, because SAS can use smarter tailor made algorithms to fit the model ("smarter" initial values, analytical derivatives).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

09-28-2016 01:52 PM

Very useful pieces of advice, dear Gergely Batho. Thanks a lot!

Jean C.