BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pedrosa
Calcite | Level 5

Dear Community,

 

DATA: I have these data on the campaign expenses of election candidates (N=10,400). While these figures generally are continuous and positive (namely the amount spent during the campaign), there also is a high number of candidates that did not spend money during their campaign. The data thus are semi-continuous or zero-inflated. Moreover, they are also highly skewed to the right, as only a few candidates spend very high sums of money.

 

GOAL: I want to fit a model to test whether female candidates spend more/less than male candidates. Gender (as a dummy) thus is the main independent variable, next to several other variables (incumbency, age, party...). As the dependent variable is highly skewed, previous studies on this topic have generally logarithmically transformed this variable. For candidates with no expenses, a minor value (e.g. 0.0001) is added to be able to calculate the logarithm. This variable is then used in a simple OLS regression model.

 

QUESTION: Although this approach of a log-linear model seems quite common, I doubt whether it is fully correct from a statistical point of view (as the p-values fluctuate strongly according to the minor value that is added in case of no expenses). I have read that there are some alternative approaches (such as mixed-effect mixed distribution models or two-part latent growth models), but how can I implement them is SAS to run my model and test my case? I have already tried to use Tooze's MIXCORR macro, but that doesn't seem to work (and I don't know why).

 

Any help is highly appreciated! (I am using SAS 9.4 on Windows.)

1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

A model using the Tweedie distribution might prove useful.  As noted in this list of Frequently-Asked For Statistics, you can fit a Tweedie model with PROC HPGENSELECT and in PROC GENMOD.  

 

Alternatively, you could fit a zero-inflated gamma model using PROC FMM. For example: 

 

proc fmm data=a:
  class g;
  model y = g / dist=gamma;
  model + / dist=constant;
  probmodel g;
  run;

 

Another possibility is to alter the gamma distribution to allow zeros. For example: 

 

proc genmod data=a;

class g;
d = _resp_/_mean_ + log(_mean_);
variance var = _mean_**2;
deviance dev = d;
model y = g / link = log;
run;

 

 

 

View solution in original post

2 REPLIES 2
StatDave
SAS Super FREQ

A model using the Tweedie distribution might prove useful.  As noted in this list of Frequently-Asked For Statistics, you can fit a Tweedie model with PROC HPGENSELECT and in PROC GENMOD.  

 

Alternatively, you could fit a zero-inflated gamma model using PROC FMM. For example: 

 

proc fmm data=a:
  class g;
  model y = g / dist=gamma;
  model + / dist=constant;
  probmodel g;
  run;

 

Another possibility is to alter the gamma distribution to allow zeros. For example: 

 

proc genmod data=a;

class g;
d = _resp_/_mean_ + log(_mean_);
variance var = _mean_**2;
deviance dev = d;
model y = g / link = log;
run;

 

 

 

pedrosa
Calcite | Level 5
Thank you for your answer! PROC GENMOD with a Tweedie distribution might indeed do the trick.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2526 views
  • 2 likes
  • 2 in conversation