02-15-2017 05:06 PM - edited 02-15-2017 05:06 PM
I used a FMM model for my data with a continuous outcome variable (ordered from 0-1000 with almost 93% valued being 0) using the followng statements:
proc fmm data=training;
model y = x1 x2 x3 x4/ dist=WEIBULL k=2;
probmodel x1 x2 x3 x4
output out = modelone residual pred;
I am wondering if I can use the beta estimates created using above procedure to calculate predicted scores in the validation data. I have used this method to score estimates from other regression models. So the equation would be:
log_y = exp(intercept+b1*x1+b2*x2+b3*x3+b4*x4);
y = exp(log_y);
Is this a correct method to create predicted scores here? I am new to FMM procedure and after reading lot of articles, it seems to be an appropriate method for 0-inflated data. However, I am not sure how to use that further to create predicted scores and then compare against the observed/actual outcome.
Apprecite your help.
02-15-2017 05:35 PM
I would use the "missing value trick" and let PROC FMM generate the predicted values itself.
The code would look something like this (NOT TESTED):
/* 1. Concatenate the original data with the score data */ data C; set training validation(in=v rename=(y=OrigY)); if v then do; y = .; /* y=. for all obs in validation data */ type = "Validation"; end; else type = "Training "; run; /* 2. Run a regression. The model is fit to the original data. */ proc fmm data=C; model y = ...; output out=Pred residual pred; quit;
The scored validation data set is the one WHERE type="Validation";
02-16-2017 03:39 PM
I don't have time right now, but now that you know the correct predicted values in a data set, you can try various equations until you get the same predictions.