topic Re: What is a faster way to run proc phreg on bootstrapped data? in Statistical Procedures

What is a faster way to run proc phreg on bootstrapped data?

pamplemousse822 — Fri, 24 Apr 2020 16:54:59 GMT

I would like to run a cox model on bootstrapped data, 1000 replications. Currently, it takes 1 hour to run the model on the original data. Ultimately I have to repeat this for several models.

The original data (dat) has 2 million observations, only includes variables used in the model, and no missing . I have access to 2 computers.

Any suggestions to make this more efficient?

Here is the code I am working with:

proc surveyselect data=dat out=dat_boot
   seed=3446 
   method=urs 
   samprate=1
   outhits rep=1000; 
run; 

proc phreg data=dat_boot outest=output covout noprint; 
   by replicate; 
   freq numberhits; 
   class zip a b c; 
   model time*y(0)=x e x*e a b c; 
   random zip; 
run;

Thanks!

Re: What is a faster way to run proc phreg on bootstrapped data?

Rick_SAS — Fri, 24 Apr 2020 21:24:06 GMT

Which estimates are you wanting to bootstrap? Are you trying to get CIs that are not provided? With that many observations, I would think the normal approximation (by using the CL option) should be sufficient for the parameters.

One option in traditional SAS is to use those two computers in parallel. On one, submit the PROC with

WHERE replicate <= 500;

and the other with

WHERE replicate > 500;

There is a phreg.cox action in SAS Viya, if your company uses Viya. I think it supports the groupby= parameter for BY-group processing across multiple threads.

Re: What is a faster way to run proc phreg on bootstrapped data?

pamplemousse822 — Fri, 24 Apr 2020 21:38:35 GMT

Hi,
I am trying get CIs for an estimate that uses a formula with the parameter estimates from the cox models, this final estimate has no CIs and bootstrapping would provide better coverage than calculating the standard error using the delta method.

I'll try using the computers in parallel and specifying replicates. Don't have access to Viya. Does opening multiple SAS sessions on the same computer work? Or would that results in the same run time.

Thanks.

Re: What is a faster way to run proc phreg on bootstrapped data?

Rick_SAS — Sat, 25 Apr 2020 11:42:27 GMT

The ESTIMATE statement can provide estimates and CIs for linear combinations of the effect parameters.

To answer your question: if you run multiple copies of SAS on the same PC, you are probably going to compete with yourself for resources. So use multiple computers if you pursue the bootstrap idea.

I wonder whether a Bayesian analysis (using the BAYES statement in PROC PHREG) will give you the distribution of the estimates that you need. Anyway, I am not an expert on survival analysis, so I will let others offer their opinions. Good luck.

Re: What is a faster way to run proc phreg on bootstrapped data?

PGStats — Sat, 25 Apr 2020 19:28:16 GMT

Is your data clustered by Zips? If so, you should Google "bootstrapping clustered data", you'll find information about the pitfalls of ignoring the original sample structure when resampling.

Re: What is a faster way to run proc phreg on bootstrapped data?

pamplemousse822 — Sat, 25 Apr 2020 21:06:13 GMT

that is a very good point, I will do this, thank you.

Re: What is a faster way to run proc phreg on bootstrapped data?

pamplemousse822 — Sat, 25 Apr 2020 21:18:33 GMT

I didn't realize you could use the estimate this way, sounds very useful. I'll check this and the Bayes statement out - thank you.

Re: What is a faster way to run proc phreg on bootstrapped data?

FreelanceReinh — Sat, 25 Apr 2020 21:36:20 GMT

@pamplemousse822 wrote:
Does opening multiple SAS sessions on the same computer work? Or would that results in the same run time.

Hi @pamplemousse822,

Look at the processor load (e.g., in Windows task manager) while one SAS session is running your PROC PHREG step. If CPU usage is well below 100%, chances are that you can run two (or more) sessions in parallel without doubling (multiplying) run time. I remember a SAS program (not PROC PHREG, though) running at about 12-13% CPU usage. It was using essentially one of the eight available threads of my workstation's quad-core processor. With six SAS sessions in parallel (working on disjoint subsets of the data) CPU usage went up to about 75% (six threads) and thus I got my results almost six times faster.