BookmarkSubscribeRSS Feed
Calcite | Level 5

I am analyzing a measure of household wealth. Its distribution is lognormal for positive values of wealth, lognormal for negative values of wealth (after taking the absolute value) and excess zeros. About 15% of observations are zero or negative.



 I have tried transforming the variable using the inverse hyperbolic sine and cube root (which normalizes the distribution) but the problem with these transformations is that there is no simple way to interpret the coefficients since the interpretation of the effect of a covariate is dependent on the values of the other covariates.


If there were no negative values, the SAS Macro MIXCORR works very well, with a logit for wealth ne 0, and lognormal for wealth gt 0. I have multiple observations per subject so the random effects model produced sensible and interpretable estimates. However, the negative values for wealth mean that MIXCORR is not appropriate. I could declare all negative values as zero but households with high negative wealth are much more like households with high positive values than households with zero or close to zero wealth. Ignoring multiple observations per subject for the time being, proc fmm looks appropriate, but can I specify three distributions and how could I do that?




proc fmm data=xx;
model wealth_2004 = afqt high_ed ses/ dist=logn; /*for positive values*/

model wealth_2004 = afqt high_ed ses / dist=binomial;/*for zero versus other values but what about the values close to zero?*/
model wealth_2004= afqt high_ed ses  / dist=logn;/*But I can't do this for negative values of wealth?*/



What would be the most appropriate approach?


This code will read the attached data file:


data xx;
infile "H:\Articles 2018\Ability & Career Gen1\Wealth.txt";
input wealth_2004 afqt high_ed ses female black married_2004 age_2004;




Your syntax is wrong, so see the PROC FMM documentation for the syntax. You only want to name the response variable one time, then use the 

MODEL + ...;

syntax to add in the zero-inflated component.


I am not an expert at PROC FMM but my advice would be to look into the PARTIAL= option on the PROC FMM statement. The idea is to first prepare the data by introducing a new categorical variable that identifies the component :

z = abs(wealth_2006);
component = sign(wealth_2006);

Then use that component to help FMM identify each component:

proc fmm data=xx partial=component;
class component;
model z = afqt high_ed ses / dist=logn k=2;
model +     / dist=constant; /* zero inflated */

The above is untested and I don't know whether it will work. But I think creating an indicator variable and modeling z=abs(wealth_2004) is worth looking into.



Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 2 in conversation