BookmarkSubscribeRSS Feed
pamplemousse822
Obsidian | Level 7

I would like to run a cox model on bootstrapped data, 1000 replications. Currently, it takes 1 hour to run the model on the original data. Ultimately I have to repeat this for several models. 

 

The original data (dat) has 2 million observations, only includes variables used in the model, and no missing . I have access to 2 computers. 

 

Any suggestions to make this more efficient? 

 

Here is the code I am working with: 

proc surveyselect data=dat out=dat_boot
   seed=3446 
   method=urs 
   samprate=1
   outhits rep=1000; 
run; 

proc phreg data=dat_boot outest=output covout noprint; 
   by replicate; 
   freq numberhits; 
   class zip a b c; 
   model time*y(0)=x e x*e a b c; 
   random zip; 
run;

Thanks!

7 REPLIES 7
Rick_SAS
SAS Super FREQ

Which estimates are you wanting to bootstrap? Are you trying to get CIs that are not provided? With that many observations, I would think the normal approximation (by using the CL option) should be sufficient for the parameters.

 

One option in traditional SAS is to use those two computers in parallel. On one, submit the PROC with 

WHERE replicate <= 500;

and the other with 

WHERE replicate > 500;

 

There is a phreg.cox action in SAS Viya, if your company uses Viya. I think it supports the groupby= parameter for BY-group processing across multiple threads.

 

pamplemousse822
Obsidian | Level 7
Hi,
I am trying get CIs for an estimate that uses a formula with the parameter estimates from the cox models, this final estimate has no CIs and bootstrapping would provide better coverage than calculating the standard error using the delta method.

I'll try using the computers in parallel and specifying replicates. Don't have access to Viya. Does opening multiple SAS sessions on the same computer work? Or would that results in the same run time.

Thanks.
Rick_SAS
SAS Super FREQ

The ESTIMATE statement can provide estimates and CIs for linear combinations of the effect parameters.

 

To answer your question: if you run multiple copies of SAS on the same PC, you are probably going to compete with yourself for resources. So use multiple computers if you pursue the bootstrap idea.

 

I wonder whether a Bayesian analysis (using the BAYES statement in PROC PHREG) will give you the distribution of the estimates that you need. Anyway, I am not an expert on survival analysis, so I will let others offer their opinions. Good luck.

pamplemousse822
Obsidian | Level 7
I didn't realize you could use the estimate this way, sounds very useful. I'll check this and the Bayes statement out - thank you.
FreelanceReinh
Jade | Level 19

@pamplemousse822 wrote:
Does opening multiple SAS sessions on the same computer work? Or would that results in the same run time.

Hi @pamplemousse822,

 

Look at the processor load (e.g., in Windows task manager) while one SAS session is running your PROC PHREG step. If CPU usage is well below 100%, chances are that you can run two (or more) sessions in parallel without doubling (multiplying) run time. I remember a SAS program (not PROC PHREG, though) running at about 12-13% CPU usage. It was using essentially one of the eight available threads of my workstation's quad-core processor. With six SAS sessions in parallel (working on disjoint subsets of the data) CPU usage went up to about 75% (six threads) and thus I got my results almost six times faster.

 

PGStats
Opal | Level 21

Is your data clustered by Zips? If so, you should Google "bootstrapping clustered data", you'll find information about the pitfalls of ignoring the original sample structure when resampling.

PG
pamplemousse822
Obsidian | Level 7
that is a very good point, I will do this, thank you.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 936 views
  • 0 likes
  • 4 in conversation