12-18-2015 11:04 AM
I am new to simulation and would like to do what I hope is a simple case.
I have a dataset with an N of 85. The data consists of the weights of truckloads of material that were weighed with scales, and data that estimates the weight based on machine performance. The estimate by truckload is interesting and can be off for an individual load by a good margin enough, but what most end users will care about is its accuracy over the course of an entire field... 80 truckloads. So it's the sum of the predictions that matters to me.
So I want to run a simulation where I randomly draw 5%, 10%, 15%, 20% etc of the loads, run a regression on those, apply it to the entire population, and see where the variability in the error between cummulative predicted mass, and the cummulative weighed mass becomes acceptable from a practical standpoint.
Is this something I can execute with do loops? Or would it be possible to do it with proc surveyselect and a do loop?
Is there a good primer out there?
12-18-2015 11:15 AM
Here are two good references, one is paid, a book from Rick Wicklin and the other is a paper on Don't be loopy that covers simulation fairly well. If you post your stats related questions in the Statistical Forum, Rick usually participates there as well.
12-24-2015 07:24 AM
In a simulation, you start with a model and you simulate data from the model. It sounds like this is a bootstrap problem, not a simulation problem, because you talk about choosing 5%, 10%, etc, of real data.
If this is real data, you can use SURVEYSELECT to extract the data. Definitely read Cassell's paper or my article on sampling with replacement in SAS.