Hi experts,
We currently use PROC SURVEY SELECT to boot strap set of data about 1000 times and i did some reasearch online to find if there is anything more efficient then this pocedure. I found an algorithm named OPDY as much faster then PROC SURVEYSELECT. Can anyone share their experience on this algorithm. Thanks.
How many extra variables are in DATA=ONE?
Why not omit OUTHITS and use NumberHits in FREQ statment. Will reduce the size of data TWO.
Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need. Add select none.
These changes should help a bit.
I am not familiar with OPDY but thought you may find this usage note useful which also includes some macros that may help in your research. 22220 - Procedures with bootstrapping, crossvalidation, or jackknifing capabilities
Do you call SURVEYSELECT in a macro loop with 1000 iterations or do you ask SURVEYSELECT for 1000 reps?
In general a hand coded method MAY be more efficient than using PROCs. Depends on your data and the estimates you compute from the bootstrap samples.
Share more of what YOU are doing. Do you have a link to the OPIE algorithm.
I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.
Here is the link of PDF that specifies OPDY: http://interstat.statjournals.net/YEAR/2010/articles/1010002.pdf
Thanks.
aruku wrote:
I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.
.
You can make this work WAY faster using the SURVEYSELECT option REPS=1000 then run PROC UNIVARIATE "BY REPLICATE;"
One call to SURVEYSELECT and UNIVARIATE vs 1000 call to each plus the APPEND is not needed.
Thanks for your recommendation, this helped me reduce around 15-18% of processing time. In the meanwhile, I will see if there are any other ways to make this more efficient.
David Cassell wrote a paper a number of years ago that discussed efficiency issues with bootstrapping techniques. Depending on your actual code you may find something that helps you. http://www2.sas.com/proceedings/forum2007/183-2007.pdf
I would have expected a bigger change. Why don't you show your work.
If your estimates are simple and can be done with with STAT functions you can use a data step and a temporary arrays to compute the boot strap sample. That's all that's going on with OPIE.
The nice thing about SURVEYSELECT you can create a 1000 bootstrap samples and the run a more complex analysis.
How many extra variables are in DATA=ONE?
Why not omit OUTHITS and use NumberHits in FREQ statment. Will reduce the size of data TWO.
Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need. Add select none.
These changes should help a bit.
There is only one variable in the dataset. Yes, i don't need any of the output. I will make the above changes and rerun the code.
Thanks for your response.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.