05-20-2016 03:10 PM
"Time required to build a statistical model is inversely related to the number of observations"
Although not intuitive, experience with datasets of all sizes (5e1 to 5e6 obs.) provided me with plenty of empirical evidence for this inverse relationship.
05-20-2016 04:28 PM
I would be interested in reading more on this. Do you have references, or is it more based on experience? Is the amount of time primarily relating to finding a good type of model?
05-20-2016 05:46 PM
The main problem with small datasets in environmental modelling is the abundance of correlated variables covering only a small portion of parameter space. When adding or removing a couple of data points changes your choice of explanatory variables, you know you are in trouble. A lot of time is spent pruning and cross-validating. Most often, the resulting model is deceptively small and simple with modest predictive power.