BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
aruku
Calcite | Level 5

Hi experts,

We currently use PROC SURVEY SELECT to boot strap set of data about 1000 times and i did some reasearch online to find if there is anything more efficient then this pocedure. I found an algorithm named OPDY as much faster then PROC SURVEYSELECT. Can anyone share their experience on this algorithm. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

How many extra variables are in DATA=ONE?

Why not omit OUTHITS and use NumberHits in FREQ statment.  Will reduce the size of data TWO.

Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need.  Add select none.

These changes should help a bit.

%let var=weight;

proc surveyselect
     
method=urs
     
outhits
     
data=sashelp.class(keep=&var)
     
out=two
     
sampsize=19
     
reps=10
     
noprint;
  
run;



proc univariate noprint data = two;
   ods select none;
  
ods output parameterestimates=three(where=(parameter='Mean')  keep=parameter estimate);
   histogram / weibull(cinitial=1) nochart;
  
var &var;
   by replicate;
   freq numberhits;
   run;

ods select all;

View solution in original post

10 REPLIES 10
MichelleHomes
Meteorite | Level 14

I am not familiar with OPDY but thought you may find this usage note useful which also includes some macros that may help in your research. 22220 - Procedures with bootstrapping, crossvalidation, or jackknifing capabilities

//Contact me to learn how Metacoda software can help keep your SAS platform secure - https://www.metacoda.com
data_null__
Jade | Level 19

Do you call SURVEYSELECT in a macro loop with 1000 iterations or do you ask SURVEYSELECT for 1000 reps?

In general a hand coded method MAY be more efficient than using PROCs.  Depends on your data and the estimates you compute from the bootstrap samples.

Share more of what YOU are doing.  Do you have a link to the OPIE algorithm.

aruku
Calcite | Level 5

I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.

Here is the link of PDF that specifies OPDY: http://interstat.statjournals.net/YEAR/2010/articles/1010002.pdf

Thanks.

data_null__
Jade | Level 19

aruku wrote:

I call SURVEYSELECT in a macro loop for 1000 iterations, I am doing unrestricted random sampling of data, then using proc univariate fitting a distribution. The output is appended to the sas dataset for all the iterations.

.

You can make this work WAY faster using the SURVEYSELECT option REPS=1000 then run PROC UNIVARIATE "BY REPLICATE;"

One call to SURVEYSELECT and UNIVARIATE vs 1000 call to each plus the APPEND is not needed.

aruku
Calcite | Level 5

Thanks for your recommendation, this helped me reduce around 15-18% of processing time. In the meanwhile, I will see if there are any other ways to make this more efficient.

ArtC
Rhodochrosite | Level 12

David Cassell wrote a paper a number of years ago that discussed efficiency issues with bootstrapping techniques.  Depending on your actual code you may find something that helps you. http://www2.sas.com/proceedings/forum2007/183-2007.pdf

data_null__
Jade | Level 19

I would have expected a bigger change.  Why don't you show your work.

If your estimates are simple and can be done with with STAT functions you can use a data step and a temporary arrays to compute the boot strap sample.  That's all that's going on with OPIE.

The nice thing about SURVEYSELECT you can create a 1000 bootstrap samples and the run a more complex analysis.

aruku
Calcite | Level 5
/************** Start *********************/
sasfile one load;
proc surveyselect method = urs outhits data = one out = two sampsize = 4041484 reps = 1000 noprint;
run;
sasfile one close;
proc univariate noprint data = two;
ods output parameterestimates = three (where (parameter = 'mean')  keep = parameter estimate);
histogram / weibull (cinitial=1) nochart;
var metric;
by replicate;
run;
**************************End**************************
Above is the modified code, i tested. The volume of "two" work dataset after 1000 samples is around 160 million records.
data_null__
Jade | Level 19

How many extra variables are in DATA=ONE?

Why not omit OUTHITS and use NumberHits in FREQ statment.  Will reduce the size of data TWO.

Even with NOPRINT on UNIVARIATE statement I sill get a lot of ouptut that I don't think you need.  Add select none.

These changes should help a bit.

%let var=weight;

proc surveyselect
     
method=urs
     
outhits
     
data=sashelp.class(keep=&var)
     
out=two
     
sampsize=19
     
reps=10
     
noprint;
  
run;



proc univariate noprint data = two;
   ods select none;
  
ods output parameterestimates=three(where=(parameter='Mean')  keep=parameter estimate);
   histogram / weibull(cinitial=1) nochart;
  
var &var;
   by replicate;
   freq numberhits;
   run;

ods select all;
aruku
Calcite | Level 5

There is only one variable in the dataset. Yes, i don't need any of the output.  I will make the above changes and rerun the code.

Thanks for your response.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1234 views
  • 6 likes
  • 4 in conversation