05-07-2018 02:30 AM
I am analyzing a measure of household wealth. Its distribution is lognormal for positive values of wealth, lognormal for negative values of wealth (after taking the absolute value) and excess zeros. About 15% of observations are zero or negative.
I have tried transforming the variable using the inverse hyperbolic sine and cube root (which normalizes the distribution) but the problem with these transformations is that there is no simple way to interpret the coefficients since the interpretation of the effect of a covariate is dependent on the values of the other covariates.
If there were no negative values, the SAS Macro MIXCORR works very well, with a logit for wealth ne 0, and lognormal for wealth gt 0. I have multiple observations per subject so the random effects model produced sensible and interpretable estimates. However, the negative values for wealth mean that MIXCORR is not appropriate. I could declare all negative values as zero but households with high negative wealth are much more like households with high positive values than households with zero or close to zero wealth. Ignoring multiple observations per subject for the time being, proc fmm looks appropriate, but can I specify three distributions and how could I do that?
proc fmm data=xx;
model wealth_2004 = afqt high_ed ses/ dist=logn; /*for positive values*/
model wealth_2004 = afqt high_ed ses / dist=binomial;/*for zero versus other values but what about the values close to zero?*/
model wealth_2004= afqt high_ed ses / dist=logn;/*But I can't do this for negative values of wealth?*/
What would be the most appropriate approach?
This code will read the attached data file:
infile "H:\Articles 2018\Ability & Career Gen1\Wealth.txt";
input wealth_2004 afqt high_ed ses female black married_2004 age_2004;
05-07-2018 01:23 PM - edited 05-07-2018 01:24 PM
Your syntax is wrong, so see the PROC FMM documentation for the syntax. You only want to name the response variable one time, then use the
MODEL + ...;
syntax to add in the zero-inflated component.
I am not an expert at PROC FMM but my advice would be to look into the PARTIAL= option on the PROC FMM statement. The idea is to first prepare the data by introducing a new categorical variable that identifies the component :
z = abs(wealth_2006); component = sign(wealth_2006);
Then use that component to help FMM identify each component:
proc fmm data=xx partial=component; class component; model z = afqt high_ed ses / dist=logn k=2; model + / dist=constant; /* zero inflated */ run;
The above is untested and I don't know whether it will work. But I think creating an indicator variable and modeling z=abs(wealth_2004) is worth looking into.