12-18-2015 11:26 AM
I am new to simulation and would like to do what I hope is a simple case.
I have a dataset with an N of 85. The data consists of the weights of truckloads of material that were weighed with scales, and data that estimates the weight based on machine performance. The estimate by truckload is interesting and can be off for an individual load by a good margin enough, but what most end users will care about is its accuracy over the course of an entire field... 80 truckloads. So it's the sum of the predictions that matters to me.
So I want to run a simulation where I randomly draw 5%, 10%, 15%, 20% etc of the loads, run a regression on those, apply it to the entire population, and see where the variability in the error between cummulative predicted mass, and the cummulative weighed mass becomes acceptable from a practical standpoint.
Is this something I can execute with do loops? Or would it be possible to do it with proc surveyselect and a do loop?
Perhaps there are some good online examples or primers out there?
12-21-2015 11:15 AM - edited 12-21-2015 12:11 PM
Here is the data
day, load, the "true" weight of the load, and the machine's guess.
I want to simulate draws from this population, to make a regression and output the slope and intercept, and use those betas to estimate the mass of the sum total of the entire population. Getting the "true" weight is inconvenient, but until we can perfect the way this machine guesses the weight I'd like to figure out what reasonble rate of subsampling it would take to still get a decent estimate of the sum total.
Thanks, I'm eager to learn something about looping and macros.
12-21-2015 11:31 AM
I suspect survey select and a loop are what I need... perhaps as a macro.
I've done little with loops in the past so I don't know if/how to imbed a procedure. The syntax seems a little quirky.
12-21-2015 04:32 PM
Looping is not really needed for this type of investigation. You can manage with BY processing. Here is what you could do, assuming your data is in dataset have:
/* Define simulation for a single sample rate */ %macro simul(pct); /* Generate random samples */ proc surveyselect data=have samprate=&pct. rep=100 out=sample&pct.; run; /* Perform a regression on each sample to predict true weight */ proc reg data=sample&pct. outest=est&pct. plots=none noprint; by replicate; predict: model weighed = estimated; run; /* Predict true weight on the whole dataset */ proc score score=est&pct. data=have out=score&pct. type=parms; by replicate; var estimated; run; /* Calculate total weights for each sample */ proc sql; create table summ&pct. as select &pct. as sampRate label="Sample Rate", replicate, sum(predict) as totalPredicted label="Predicted Total Weight" from score&pct. group by replicate; delete from weights where samprate=&pct.; quit; /* Accumulate results in weights dataset */ proc append base=weights data=summ&pct.; run; %mend simul; /* Call macro for each sample rate */ %simul(5); %simul(10); %simul(15); %simul(20); %simul(25); %simul(30); /* Calculate the true total weight */ proc sql noprint; select sum(weighed) into :trueTotalWeight from have; quit; /* Look at the distributions of total weight estimates, robust measures of dispersion in particular */ proc univariate data=weights location=&trueTotalWeight. robustscale; by samprate; var totalPredicted; run;
12-31-2015 12:28 PM
@PGStats I have been using Proc SQL a lot (mostly statements- "So Few Workers Go Home Ontime". I know we can use execute statment to manipulate external RDBM. But this stament is complete new to me in your above program. Can we use Data step stament in Proc SQL other than Data set options? Thanks !
delete from weights where samprate=&pct.;
12-31-2015 03:29 PM
There is more to standard SQL than select statements. Check out delete, insert and update statements.
SAS additions are dateset options, as you mentioned, macro variable creation (select ... into :macrovar), and SAS functions (including powerful date, text distance, text matching, and user defined FCMP functions).