turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Most efficient Boot strapping Methodology - SAS

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-12-2013 03:39 AM

Hi experts,

We currently use PROC SURVEY SELECT to boot strap set of data about 1000 times and i did some reasearch online to find if there is anything more efficient then this pocedure. I found an algorithm named OPDY as much faster then PROC SURVEYSELECT. Can anyone share their experience on this algorithm. Thanks.

Accepted Solutions

Solution

03-13-2013
01:35 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aruku

03-13-2013 01:35 PM

How many extra variables are in DATA=ONE?

Why not omit OUTHITS and use NumberHits in FREQ statment. Will reduce the size of data TWO.

Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need. Add select none.

These changes should help a bit.

%let var=weight;

**proc** **surveyselect**

method=urs

outhits

data=sashelp.class(keep=&var)

out=two

sampsize=**19**

reps=**10**

noprint;

**run**;

**proc** **univariate** noprint data = two;

ods select none;

ods output parameterestimates=three(where=(parameter='Mean') keep=parameter estimate);

histogram / weibull(cinitial=**1**) nochart;

var &var;

by replicate;

freq numberhits;

**run**;

ods select all;

method=urs

outhits

data=sashelp.class(keep=&var)

out=two

sampsize=

reps=

noprint;

ods select none;

ods output parameterestimates=three(where=(parameter='Mean') keep=parameter estimate);

histogram / weibull(cinitial=

var &var;

by replicate;

freq numberhits;

ods select all;

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aruku

03-12-2013 05:53 AM

I am not familiar with OPDY but thought you may find this usage note useful which also includes some macros that may help in your research. 22220 - Procedures with bootstrapping, crossvalidation, or jackknifing capabilities

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aruku

03-12-2013 06:45 AM

Do you call SURVEYSELECT in a macro loop with 1000 iterations or do you ask SURVEYSELECT for 1000 reps?

In general a hand coded method MAY be more efficient than using PROCs. Depends on your data and the estimates you compute from the bootstrap samples.

Share more of what YOU are doing. Do you have a link to the OPIE algorithm.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

03-12-2013 01:03 PM

I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.

Here is the link of PDF that specifies OPDY: http://interstat.statjournals.net/YEAR/2010/articles/1010002.pdf

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aruku

03-12-2013 02:03 PM

aruku wrote:

I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.

.

You can make this work WAY faster using the SURVEYSELECT option REPS=1000 then run PROC UNIVARIATE "BY REPLICATE;"

One call to SURVEYSELECT and UNIVARIATE vs 1000 call to each plus the APPEND is not needed.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

03-12-2013 11:15 PM

Thanks for your recommendation, this helped me reduce around 15-18% of processing time. In the meanwhile, I will see if there are any other ways to make this more efficient.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aruku

03-13-2013 02:10 AM

David Cassell wrote a paper a number of years ago that discussed efficiency issues with bootstrapping techniques. Depending on your actual code you may find something that helps you. http://www2.sas.com/proceedings/forum2007/183-2007.pdf

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aruku

03-13-2013 06:42 AM

I would have expected a bigger change. Why don't you show your work.

If your estimates are simple and can be done with with STAT functions you can use a data step and a temporary arrays to compute the boot strap sample. That's all that's going on with OPIE.

The nice thing about SURVEYSELECT you can create a 1000 bootstrap samples and the run a more complex analysis.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

03-13-2013 12:58 PM

/************** Start *********************/

sasfile one load;

proc surveyselect method = urs outhits data = one out = two sampsize = 4041484 reps = 1000 noprint;

run;

sasfile one close;

proc univariate noprint data = two;

ods output parameterestimates = three (where (parameter = 'mean') keep = parameter estimate);

histogram / weibull (cinitial=1) nochart;

var metric;

by replicate;

run;

**************************End**************************

Above is the modified code, i tested. The volume of "two" work dataset after 1000 samples is around 160 million records.

Solution

03-13-2013
01:35 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to aruku

03-13-2013 01:35 PM

How many extra variables are in DATA=ONE?

Why not omit OUTHITS and use NumberHits in FREQ statment. Will reduce the size of data TWO.

Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need. Add select none.

These changes should help a bit.

%let var=weight;

**proc** **surveyselect**

method=urs

outhits

data=sashelp.class(keep=&var)

out=two

sampsize=**19**

reps=**10**

noprint;

**run**;

**proc** **univariate** noprint data = two;

ods select none;

ods output parameterestimates=three(where=(parameter='Mean') keep=parameter estimate);

histogram / weibull(cinitial=**1**) nochart;

var &var;

by replicate;

freq numberhits;

**run**;

ods select all;

method=urs

outhits

data=sashelp.class(keep=&var)

out=two

sampsize=

reps=

noprint;

ods select none;

ods output parameterestimates=three(where=(parameter='Mean') keep=parameter estimate);

histogram / weibull(cinitial=

var &var;

by replicate;

freq numberhits;

ods select all;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to data_null__

03-13-2013 02:30 PM

There is only one variable in the dataset. Yes, i don't need any of the output. I will make the above changes and rerun the code.

Thanks for your response.