BookmarkSubscribeRSS Feed
Calcite | Level 5

Zero Inflated Dependent variable

Hi,

I was performing Linear Regression which is based on E-Commerce Dataset. I was stuck with the following problem.

Assumption: In the dataset, I am taking store_purchase_event_count as a Dependent variable for predicting store_purchase_event_count using OLS Linear Regression.

Problem: I am trying to Normalize the Dependent variable but it contains more than 50-60% of zeroes. So, I was not able to figure out how I should move forward with this problem.

Solutions tried:
1. Added constant to each value of Y and then taking the log.
2. Taking the square root of each value.

None of the above solutions is making Y variable normal. Please suggest how to move forward
2 REPLIES 2
Tourmaline | Level 20

Re: Zero Inflated Dependent variable

Moved question to " SAS Statistical Procedures"

SAS Super FREQ

Re: Zero Inflated Dependent variable

I assume that your response is positively valued except for the zeros. If that is correct, and if the values are all integers (like a count: 0, 1, 2, 3, ...), then you can fit a zero-inflated Poisson or negative binomial model using PROC GENMOD. See the GENMOD documentation. If the response is positive and continuous, then you could try fitting a zero-inflated gamma model using PROC FMM - for example:

``````data a;
call streaminit(2342);
do i=1 to 100;
y=rand("gamma",2);
output;
end;
do i=1 to 10; y=0; output; end;
run;

/* histogram of data */
proc sgplot data=a;
histogram y / showbins nbins=9;
run;

/* zero-inflated gamma model */
proc fmm data=a plots=density(nbins=9);
model y= / dist=gamma;
model + / dist=constant;
run;

``````
Discussion stats
• 2 replies
• 2102 views
• 1 like
• 3 in conversation