Programming the statistical procedures from SAS

How to model semi-continuous data?

Accepted Solution Solved
Reply
Highlighted
New Contributor
Posts: 2
Accepted Solution

How to model semi-continuous data?

Dear Community,

 

DATA: I have these data on the campaign expenses of election candidates (N=10,400). While these figures generally are continuous and positive (namely the amount spent during the campaign), there also is a high number of candidates that did not spend money during their campaign. The data thus are semi-continuous or zero-inflated. Moreover, they are also highly skewed to the right, as only a few candidates spend very high sums of money.

 

GOAL: I want to fit a model to test whether female candidates spend more/less than male candidates. Gender (as a dummy) thus is the main independent variable, next to several other variables (incumbency, age, party...). As the dependent variable is highly skewed, previous studies on this topic have generally logarithmically transformed this variable. For candidates with no expenses, a minor value (e.g. 0.0001) is added to be able to calculate the logarithm. This variable is then used in a simple OLS regression model.

 

QUESTION: Although this approach of a log-linear model seems quite common, I doubt whether it is fully correct from a statistical point of view (as the p-values fluctuate strongly according to the minor value that is added in case of no expenses). I have read that there are some alternative approaches (such as mixed-effect mixed distribution models or two-part latent growth models), but how can I implement them is SAS to run my model and test my case? I have already tried to use Tooze's MIXCORR macro, but that doesn't seem to work (and I don't know why).

 

Any help is highly appreciated! (I am using SAS 9.4 on Windows.)


Accepted Solutions
Solution
Friday
SAS Employee
Posts: 169

Re: How to model semi-continuous data?

A model using the Tweedie distribution might prove useful.  As noted in this list of Frequently-Asked For Statistics, you can fit a Tweedie model with PROC HPGENSELECT and in PROC GENMOD.  

 

Alternatively, you could fit a zero-inflated gamma model using PROC FMM. For example: 

 

proc fmm data=a:
  class g;
  model y = g / dist=gamma;
  model + / dist=constant;
  probmodel g;
  run;

 

Another possibility is to alter the gamma distribution to allow zeros. For example: 

 

proc genmod data=a;

class g;
d = _resp_/_mean_ + log(_mean_);
variance var = _mean_**2;
deviance dev = d;
model y = g / link = log;
run;

 

 

 

View solution in original post


All Replies
Solution
Friday
SAS Employee
Posts: 169

Re: How to model semi-continuous data?

A model using the Tweedie distribution might prove useful.  As noted in this list of Frequently-Asked For Statistics, you can fit a Tweedie model with PROC HPGENSELECT and in PROC GENMOD.  

 

Alternatively, you could fit a zero-inflated gamma model using PROC FMM. For example: 

 

proc fmm data=a:
  class g;
  model y = g / dist=gamma;
  model + / dist=constant;
  probmodel g;
  run;

 

Another possibility is to alter the gamma distribution to allow zeros. For example: 

 

proc genmod data=a;

class g;
d = _resp_/_mean_ + log(_mean_);
variance var = _mean_**2;
deviance dev = d;
model y = g / link = log;
run;

 

 

 

New Contributor
Posts: 2

Re: How to model semi-continuous data?

Thank you for your answer! PROC GENMOD with a Tweedie distribution might indeed do the trick.
Post a Question
Discussion Stats
  • 2 replies
  • 83 views
  • 2 likes
  • 2 in conversation