Help using Base SAS procedures

Most efficient Boot strapping Methodology - SAS

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 15
Accepted Solution

Most efficient Boot strapping Methodology - SAS

Hi experts,

We currently use PROC SURVEY SELECT to boot strap set of data about 1000 times and i did some reasearch online to find if there is anything more efficient then this pocedure. I found an algorithm named OPDY as much faster then PROC SURVEYSELECT. Can anyone share their experience on this algorithm. Thanks.


Accepted Solutions
Solution
‎03-13-2013 01:35 PM
Respected Advisor
Posts: 3,777

Re: Most efficient Boot strapping Methodology - SAS

How many extra variables are in DATA=ONE?

Why not omit OUTHITS and use NumberHits in FREQ statment.  Will reduce the size of data TWO.

Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need.  Add select none.

These changes should help a bit.

%let var=weight;

proc surveyselect
     
method=urs
     
outhits
     
data=sashelp.class(keep=&var)
     
out=two
     
sampsize=19
     
reps=10
     
noprint;
  
run;



proc univariate noprint data = two;
   ods select none;
  
ods output parameterestimates=three(where=(parameter='Mean')  keep=parameter estimate);
   histogram / weibull(cinitial=1) nochart;
  
var &var;
   by replicate;
   freq numberhits;
   run;

ods select all;

View solution in original post


All Replies
Trusted Advisor
Posts: 1,247

Re: Most efficient Boot strapping Methodology - SAS

I am not familiar with OPDY but thought you may find this usage note useful which also includes some macros that may help in your research. 22220 - Procedures with bootstrapping, crossvalidation, or jackknifing capabilities

Respected Advisor
Posts: 3,777

Re: Most efficient Boot strapping Methodology - SAS

Do you call SURVEYSELECT in a macro loop with 1000 iterations or do you ask SURVEYSELECT for 1000 reps?

In general a hand coded method MAY be more efficient than using PROCs.  Depends on your data and the estimates you compute from the bootstrap samples.

Share more of what YOU are doing.  Do you have a link to the OPIE algorithm.

Occasional Contributor
Posts: 15

Re: Most efficient Boot strapping Methodology - SAS

I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.

Here is the link of PDF that specifies OPDY: http://interstat.statjournals.net/YEAR/2010/articles/1010002.pdf

Thanks.

Respected Advisor
Posts: 3,777

Re: Most efficient Boot strapping Methodology - SAS

aruku wrote:

I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.

.

You can make this work WAY faster using the SURVEYSELECT option REPS=1000 then run PROC UNIVARIATE "BY REPLICATE;"

One call to SURVEYSELECT and UNIVARIATE vs 1000 call to each plus the APPEND is not needed.

Occasional Contributor
Posts: 15

Re: Most efficient Boot strapping Methodology - SAS

Thanks for your recommendation, this helped me reduce around 15-18% of processing time. In the meanwhile, I will see if there are any other ways to make this more efficient.

Valued Guide
Posts: 632

Re: Most efficient Boot strapping Methodology - SAS

David Cassell wrote a paper a number of years ago that discussed efficiency issues with bootstrapping techniques.  Depending on your actual code you may find something that helps you. http://www2.sas.com/proceedings/forum2007/183-2007.pdf

Respected Advisor
Posts: 3,777

Re: Most efficient Boot strapping Methodology - SAS

I would have expected a bigger change.  Why don't you show your work.

If your estimates are simple and can be done with with STAT functions you can use a data step and a temporary arrays to compute the boot strap sample.  That's all that's going on with OPIE.

The nice thing about SURVEYSELECT you can create a 1000 bootstrap samples and the run a more complex analysis.

Occasional Contributor
Posts: 15

Re: Most efficient Boot strapping Methodology - SAS

/************** Start *********************/
sasfile one load;
proc surveyselect method = urs outhits data = one out = two sampsize = 4041484 reps = 1000 noprint;
run;
sasfile one close;
proc univariate noprint data = two;
ods output parameterestimates = three (where (parameter = 'mean')  keep = parameter estimate);
histogram / weibull (cinitial=1) nochart;
var metric;
by replicate;
run;
**************************End**************************
Above is the modified code, i tested. The volume of "two" work dataset after 1000 samples is around 160 million records.
Solution
‎03-13-2013 01:35 PM
Respected Advisor
Posts: 3,777

Re: Most efficient Boot strapping Methodology - SAS

How many extra variables are in DATA=ONE?

Why not omit OUTHITS and use NumberHits in FREQ statment.  Will reduce the size of data TWO.

Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need.  Add select none.

These changes should help a bit.

%let var=weight;

proc surveyselect
     
method=urs
     
outhits
     
data=sashelp.class(keep=&var)
     
out=two
     
sampsize=19
     
reps=10
     
noprint;
  
run;



proc univariate noprint data = two;
   ods select none;
  
ods output parameterestimates=three(where=(parameter='Mean')  keep=parameter estimate);
   histogram / weibull(cinitial=1) nochart;
  
var &var;
   by replicate;
   freq numberhits;
   run;

ods select all;
Occasional Contributor
Posts: 15

Re: Most efficient Boot strapping Methodology - SAS

There is only one variable in the dataset. Yes, i don't need any of the output.  I will make the above changes and rerun the code.

Thanks for your response.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 380 views
  • 6 likes
  • 4 in conversation