BookmarkSubscribeRSS Feed
garymarks
Calcite | Level 5

I am analyzing a measure of household wealth. Its distribution is lognormal for positive values of wealth, lognormal for negative values of wealth (after taking the absolute value) and excess zeros. About 15% of observations are zero or negative.

 

 

 I have tried transforming the variable using the inverse hyperbolic sine and cube root (which normalizes the distribution) but the problem with these transformations is that there is no simple way to interpret the coefficients since the interpretation of the effect of a covariate is dependent on the values of the other covariates.

 

If there were no negative values, the SAS Macro MIXCORR works very well, with a logit for wealth ne 0, and lognormal for wealth gt 0. I have multiple observations per subject so the random effects model produced sensible and interpretable estimates. However, the negative values for wealth mean that MIXCORR is not appropriate. I could declare all negative values as zero but households with high negative wealth are much more like households with high positive values than households with zero or close to zero wealth. Ignoring multiple observations per subject for the time being, proc fmm looks appropriate, but can I specify three distributions and how could I do that?

 

 

 

proc fmm data=xx;
model wealth_2004 = afqt high_ed ses/ dist=logn; /*for positive values*/

model wealth_2004 = afqt high_ed ses / dist=binomial;/*for zero versus other values but what about the values close to zero?*/
model wealth_2004= afqt high_ed ses  / dist=logn;/*But I can't do this for negative values of wealth?*/

run;                 

 

What would be the most appropriate approach?

 

This code will read the attached data file:

 

data xx;
infile "H:\Articles 2018\Ability & Career Gen1\Wealth.txt";
input wealth_2004 afqt high_ed ses female black married_2004 age_2004;
run;

 

 

1 REPLY 1
Rick_SAS
SAS Super FREQ

Your syntax is wrong, so see the PROC FMM documentation for the syntax. You only want to name the response variable one time, then use the 

MODEL + ...;

syntax to add in the zero-inflated component.

 

I am not an expert at PROC FMM but my advice would be to look into the PARTIAL= option on the PROC FMM statement. The idea is to first prepare the data by introducing a new categorical variable that identifies the component :

z = abs(wealth_2006);
component = sign(wealth_2006);

Then use that component to help FMM identify each component:

proc fmm data=xx partial=component;
class component;
model z = afqt high_ed ses / dist=logn k=2;
model +     / dist=constant; /* zero inflated */
run;    

The above is untested and I don't know whether it will work. But I think creating an indicator variable and modeling z=abs(wealth_2004) is worth looking into.

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1052 views
  • 0 likes
  • 2 in conversation