Bootstraps, Permutation Tests, and Sampling With and Without Replacement Orders of Magnitude Faster Using SAS®
J.D. Opdyke,* DataMineIt
download at
http://www.datamineit.com/DMI_publications.htm
A very efficient approach to random sampling in SAS® achieves speed increases orders of magnitude faster than the relevant “built-in” SAS® procedures. For sampling with replacement as applied to bootstraps, seven algorithms coded in SAS® are compared, and the fastest (“OPDY”), based on the new approach and using no modules beyond Base SAS®, achieves speed increases over 220x faster than Proc SurveySelect. OPDY also handles datasets many times larger than those on which two hashing algorithms crash. For sampling without replacement as applied to permutation tests, six algorithms coded in SAS® are compared, and the fastest (“OPDN”), based on the new approach and using no modules beyond Base SAS®, achieves speed increases over 215x faster than Proc SurveySelect, over 350x faster than NPAR1WAY (which crashes on datasets less than a tenth the size OPDN can handle), and over 720x faster than Proc Multtest. OPDN utilizes a simple draw-by-draw procedure that allows for the repeated creation of many without-replacement permutation samples without requiring any additional storage or memory space. Based on these results, there appear to be no faster or more scalable algorithms in SAS® for bootstraps, permutation tests, or sampling with or without replacement.
Keywords: Bootstrap, Permutation, SAS, Scalable, Replacement, Sampling
JEL Classifications: C12, C13, C14, C15, C63, C88
Mathematics Subject Classification: 62G09, 62G10, 62F40
© 2011 by John Douglas Opdyke. All rights reserved. Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit, including © notice, is given to the source.
* J.D. Opdyke is Managing Director of Quantitative Strategies at DataMineIt, a consultancy specializing in applied statistical, econometric, and algorithmic solutions for the financial and consulting sectors. Clients include multiple Fortune 50 banks and credit card companies, big 4 and economic consulting firms, venture capital firms, and large marketing and advertising firms. J.D. has been a SAS® user for over 20 years and routinely writes SAS® code faster (often orders of magnitude faster) than SAS® Procs (including but not limited to Proc Logistic, Proc MultTest, Proc Summary, Proc Means, Proc NPAR1WAY, Proc Plan, and Proc SurveySelect). He earned his undergraduate degree from Yale University, his graduate degree from Harvard University where he was a Kennedy Fellow, and has completed additional post-graduate work as an ASP Fellow in the graduate mathematics department at MIT. Additional of his peer reviewed publications spanning number theory/combinatorics, statistical finance, statistical computation, applied econometrics, and hypothesis testing for statistical quality control can be accessed at
www.DataMineIt.com.