BookmarkSubscribeRSS Feed
chrisd9970
Fluorite | Level 6

I am running a Cox model with multiple time-variant covariates and I have 1.5 million records. The model has taken up too much disk space and I could not run my model.

 

I learn about MULTIPASS option and I was wondering if this option works for time-dependent variables defined using programming statement? I know this will work in counting process data but will MULTIPASS work in programming statement? I can run my program but it has been days and is still running. Just wondering if it works.

 

I'm confused with the last sentence on the online documentation:

 

MULTIPASS

requests that, for each Newton-Raphson iteration, PROC PHREG recompile the risk sets corresponding to the event times for the (start,stop) style of response and recomputes the values of the time-dependent variables defined by the programming statements for each observation in the risk sets. If the MULTIPASS option is not specified, PROC PHREG computes all risk sets and all the variable values and saves them in a utility file. The MULTIPASS option decreases required disk space at the expense of increased execution time; however, for very large data, it might actually save time since it is time-consuming to write and read large utility files. This option has an effect only when the (start,stop) style of response is used or when there are time-dependent explanatory variables.

 

Your input is very much appreciated. Thank you.

 

7 REPLIES 7
Rick_SAS
SAS Super FREQ

The last sentence is telling you that the MULTIPASS option is only relevant when your MODEL statement looks like

model (TStart,TStop)*Status(0)=Trt Age ... ;

 

 

chrisd9970
Fluorite | Level 6

Thanks Rick,

 

That means MULTIPASS will not work using programming statement? Only counting process datasets have TimeStart and TimeStop, right? There is usually one TIME variable in the model statement for the programming statement method. Below is a sample of my code with only one time dependent covariate.

 

   proc phreg data=mydata;
      model Time*Status(0)=RiskFactor;
      RiskFactor = (0 <= RiskFactorStart < Time);
   run;

 

Rick_SAS
SAS Super FREQ

I didn't mean to imply that. I was trying to translate the portion of the sentence that you boldfaced. The last portion of the same sentence says "or when there are time-dependent explanatory variables."

chrisd9970
Fluorite | Level 6

Oh.. pardon me. 

So my code should run. But it's taking too long. 

 

Thank you for your reply.

JacobSimonsen
Barite | Level 11

The problem is that PROC PHREG will not per default make aggregation on each riskset when time-dependent variables are present. Its also not always that it will be possible. Instead, it for each event go evaluate every individual whether it is at risk at that eventtime or not. That makes the running time quadratic relative to the cohort size. If multipass option is used, then it do this exercise again for each iteration. 

 

If your timedependent covariate is piecewise constant, then you can chop your data into intervals. Such that you will have both an entry and exittime variable. Then specify the "fast" option to tell PROC PHREG that it should use the aggregetion technique when it calculate the likelihood function. The calculation time will then only be linear relative to the cohort size, and you will probably be surprised how fast go.

 

Good luck.

chrisd9970
Fluorite | Level 6

Thanks so much Jacob. My PHREG works now. Before, I specified risk limits, rl=pl. I changed it to just rl and my model works faster. I still have to use MULTIPASS or else I blow up the disk space again.

 

The FAST option doesn't work for me. I think it has to do with the SAS environment I have at work. Below is my error message.

 

---------------------------------------------------

ERROR:  An exception has been encountered.

Please contact technical support and provide them with the following traceback information:

 

The SAS task name is [PHREG   ]

Segmentation Violation

---------------------------------------------------

 

I also saw your other post about the  %coxaggregate macro. I will try it. 

 

Thanks again for your reply to my question!

JacobSimonsen
Barite | Level 11

Yes, the Cox-aggregate macro can also be used instead of the fast-option. It willdo the aggregation on riskset. The weigt statement in phreg should then be used to specify how many person there are at risk. Examples of how to do this is below the macro.

It will make a much smaller utility file so you shouldnt get problems with disk-space.

 

I dont see any reason for experimenting with using multipass or not, because, if the utility file can be created once, then it almost sure can be recreated again for each iteration. It may save a bit time to use it, or not use it, but it will not do the major difference. The major difference is obtained by aggregating.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1364 views
  • 2 likes
  • 3 in conversation