Contributor
Posts: 52

# Beginner at simulation and want to get my feet wet

I am new to simulation and would like to do what I hope is a simple case.

I have a dataset with an N of 85.  The data consists of the weights of truckloads of material that were weighed with scales, and data that estimates the weight based on machine performance.  The estimate by truckload is interesting and can be off for an individual load by a good margin enough, but what most end users will care about is its accuracy over the course of an entire field... 80 truckloads.  So it's the sum of the predictions that matters to me.

So I want to run a simulation where I randomly draw 5%, 10%, 15%, 20% etc of the loads, run a regression on those, apply it to the entire population, and see where the variability in the error between cummulative predicted mass, and the cummulative weighed mass becomes acceptable from a practical standpoint.

Is this something I can execute with do loops?  Or would it be possible to do it with proc surveyselect and a do loop?

Perhaps there are some good online examples or primers out there?

Posts: 5,524

PG
Contributor
Posts: 52

## Re: Beginner at simulation and want to get my feet wet

[ Edited ]

Here is the data

day, load, the "true" weight of the load, and the machine's guess.

I want to simulate draws from this population, to make a regression and output the slope and intercept, and use those betas to estimate the mass of the sum total of the entire population.  Getting the "true" weight is inconvenient, but until we can perfect the way this machine guesses the weight I'd like to figure out what reasonble rate of subsampling it would take to still get a decent estimate of the sum total.

Thanks, I'm eager to learn something about looping and macros.

 day load weighed estimated 1 101 4115 4787 1 102 8265 7828 1 103 7534 4085 1 104 10205 3898 1 105 10466 5312 1 106 11719 5546 1 107 4489 4751 2 201 7526 8356 2 205 9379 6531 2 207 7212 5247 2 208 6690 8670 2 209 3776 4710 2 210 11493 12668 2 211 2097 1360 2 212 6281 8667 2 213 8456 7896 2 214 7926 6321 2 215 4637 5511 3 301 7534 8933 3 302 10666 6754 3 303 6177 4246 3 304 8456 4855 3 305 8004 5491 3 306 8056 6590 3 307 8674 6923 3 308 7195 4783 3 309 3489 2589 5 501 3950 4397 5 502 8665 9732 5 503 7238 7979 5 504 8587 8999 5 505 7882 5760 5 506 4994 5035 5 507 4454 3463 5 508 5594 5235 5 509 8891 9772 5 510 5229 5880 5 511 6786 5622 5 512 7343 7967 5 513 8291 9275 6 604 11963 9988 6 606 4367 895 6 607 5577 5184 6 608 12371 11786 6 610 6673 7947 6 611 5820 4279 6 612 6055 8555 6 613 5881 4360 6 614 4663 2393 6 615 4228 4685 6 616 4759 898 6 617 3854 4739 6 618 8709 9386 6 619 8691 9333 6 620 8126 9509 6 621 8138 9356 6 622 11536 10899 7 701 3619 4401 7 702 3271 3450 7 703 6725 7975 7 704 7265 8109 7 705 5629 5590 7 706 5394 5497 7 707 18340 17964 7 708 1775 1459 7 709 5037 5525 7 710 15730 11124 7 711 6542 7216 7 712 6951 7129 7 713 8108 7028 7 714 2984 3470 7 715 1366 1791 8 801 4176 5212 8 802 6212 6630 8 803 5342 6759 8 804 5203 5465 8 805 6308 5641 8 806 6647 5463 8 807 5777 4818 8 808 6795 3058 8 809 2906 1745 8 810 6769 4742 8 811 5890 5015 8 812 6656 5581 8 813 3906 3276
Contributor
Posts: 52

## Re: Beginner at simulation and want to get my feet wet

I suspect survey select and a loop are what I need... perhaps as a macro.

I've done little with loops in the past so I don't know if/how to imbed a procedure.  The syntax seems a little quirky.

Posts: 5,524

## Re: Beginner at simulation and want to get my feet wet

Looping is not really needed for this type of investigation. You can manage with BY processing. Here is what you could do, assuming your data is in dataset have:

``````
/* Define simulation for a single sample rate */
%macro simul(pct);
/* Generate random samples */
proc surveyselect data=have samprate=&pct. rep=100 out=sample&pct.;
run;

/* Perform a regression on each sample to predict true weight */
proc reg data=sample&pct. outest=est&pct. plots=none noprint;
by replicate;
predict: model weighed = estimated;
run;

/* Predict true weight on the whole dataset */
proc score score=est&pct. data=have out=score&pct. type=parms;
by replicate;
var estimated;
run;

/* Calculate total weights for each sample */
proc sql;
create table summ&pct. as
select
&pct. as sampRate label="Sample Rate",
replicate,
sum(predict) as totalPredicted label="Predicted Total Weight"
from score&pct.
group by replicate;
delete from weights where samprate=&pct.;
quit;

/* Accumulate results in weights dataset */
proc append base=weights data=summ&pct.; run;
%mend simul;

/* Call macro for each sample rate */
%simul(5);
%simul(10);
%simul(15);
%simul(20);
%simul(25);
%simul(30);

/* Calculate the true total weight */
proc sql noprint;
select sum(weighed) into :trueTotalWeight
from have;
quit;

/* Look at the distributions of total weight estimates,
robust measures of dispersion in particular */
proc univariate data=weights location=&trueTotalWeight. robustscale;
by samprate;
var totalPredicted;
run;
``````
PG
Contributor
Posts: 52

## Re: Beginner at simulation and want to get my feet wet

Wow, a lot more than I was hoping.  Thank you!!

Super Contributor
Posts: 285

## Re: Beginner at simulation and want to get my feet wet

@PGStats I have been using Proc SQL a lot (mostly statements- "So Few Workers Go Home Ontime". I know we can use execute statment  to manipulate external RDBM. But this stament is complete new to me in your above program.  Can we use Data step stament in Proc SQL other than Data set options?  Thanks !

``delete from weights where samprate=&pct.;``

Posts: 5,524