09-20-2011 05:55 PM
I used polynomial distribution lag (PDL) models to analyze the population of insect.
In the PDL model, the record with missing data will be ignored so the observation of dependent variable will be a little different.
For example: MODEL (1) Y=A + B + C
MODEL (2) Y=D + E + F
If there is no missing data, I can use AIC, RMSE, or Total R-Square to compare the model performence.
However, in the model (1), the A variable has some missing data so the observation number of Y will be fewer than model (2)
Under this situation, is RMSE OK to compare the model performence?
Thanks in advance...
09-20-2011 07:24 PM
Why not limit to only cases that are in both models for consistency?
You can also compare the model estimates and RMSE for the model and then without the observations it would lose by this method to see the effect.
09-21-2011 10:29 AM
If I limit to only cases with no missing data, I will lost many observations in MODEL(2).
That's why I am wondering which estimates is appropriate to compare two models if I don't delete any obs in MODEL (2).
I have data of mutiple years so I will do cross validation year by year.