turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to model semi-continuous data?

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

Highlighted
# How to model semi-continuous data?

Options

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

03-20-2017 08:38 AM

Dear Community,

DATA: I have these data on the campaign expenses of election candidates (N=10,400). While these figures generally are continuous and positive (namely the amount spent during the campaign), there also is a high number of candidates that did not spend money during their campaign. The data thus are semi-continuous or zero-inflated. Moreover, they are also highly skewed to the right, as only a few candidates spend very high sums of money.

GOAL: I want to fit a model to test whether female candidates spend more/less than male candidates. Gender (as a dummy) thus is the main independent variable, next to several other variables (incumbency, age, party...). As the dependent variable is highly skewed, previous studies on this topic have generally logarithmically transformed this variable. For candidates with no expenses, a minor value (e.g. 0.0001) is added to be able to calculate the logarithm. This variable is then used in a simple OLS regression model.

QUESTION: Although this approach of a log-linear model seems quite common, I doubt whether it is fully correct from a statistical point of view (as the p-values fluctuate strongly according to the minor value that is added in case of no expenses). I have read that there are some alternative approaches (such as mixed-effect mixed distribution models or two-part latent growth models), but how can I implement them is SAS to run my model and test my case? I have already tried to use Tooze's MIXCORR macro, but that doesn't seem to work (and I don't know why).

Any help is highly appreciated! (I am using SAS 9.4 on Windows.)

Accepted Solutions

Solution

03-24-2017
10:13 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pedrosa

03-23-2017 03:13 PM

A model using the Tweedie distribution might prove useful. As noted in this list of Frequently-Asked For Statistics, you can fit a Tweedie model with PROC HPGENSELECT and in PROC GENMOD.

Alternatively, you could fit a zero-inflated gamma model using PROC FMM. For example:

proc fmm data=a:

class g;

model y = g / dist=gamma;

model + / dist=constant;

probmodel g;

run;

Another possibility is to alter the gamma distribution to allow zeros. For example:

proc genmod data=a;

class g;

d = _resp_/_mean_ + log(_mean_);

variance var = _mean_**2;

deviance dev = d;

model y = g / link = log;

run;

All Replies

Solution

03-24-2017
10:13 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to pedrosa

03-23-2017 03:13 PM

A model using the Tweedie distribution might prove useful. As noted in this list of Frequently-Asked For Statistics, you can fit a Tweedie model with PROC HPGENSELECT and in PROC GENMOD.

Alternatively, you could fit a zero-inflated gamma model using PROC FMM. For example:

proc fmm data=a:

class g;

model y = g / dist=gamma;

model + / dist=constant;

probmodel g;

run;

Another possibility is to alter the gamma distribution to allow zeros. For example:

proc genmod data=a;

class g;

d = _resp_/_mean_ + log(_mean_);

variance var = _mean_**2;

deviance dev = d;

model y = g / link = log;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to StatDave_sas

03-24-2017 10:13 AM

Thank you for your answer! PROC GENMOD with a Tweedie distribution might indeed do the trick.