BookmarkSubscribeRSS Feed
mayankdce1
Fluorite | Level 6

While the running the proc phreg on counting data framework, I am using baseline statement as shown below:

 

proc phreg data = mod plots(overlay cl) = (survival cumhaz) outest = est namelen = 50;
model (start, stop) * depvar (0) = &model_varlist. / itprint rl = both ties = breslow selection = stepwise sle = 0.2 sls = 0.05;
baseline out = baseline survival = survival xbeta = xbeta covariates = val cumhaz = cumhaz / method = pl;
run;

 

However, my covariates dataset have close to 3 million records. Baseline data creates 20 observations for each of these 3 million observations as my modeling data experienced events at 20 time points. With 60 million observation data, my disk space and memory throws error.

 

I am not interested in all 20 survival probabilities. Is there a way I can stop the baseline to create 20 probabilities. Only 1 or 2 survival probabilities at t = 30 and 60 days will suffice. This way my data set will only be of 6 million observations.

 

Any immediate help is highly appreciated.

5 REPLIES 5
Reeza
Super User

Have you tried the TIMELIST option?

 

https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_phreg_syntax03.htm&docsetVersion=...

 

TIMEPOINT=list

specifies alistof time points at which the requested prediction statistics are output to the OUT= and the OUTDIFF= data sets. The prediction statistics in the OUT= data set include the survivor function estimates, the direct adjusted survivor function estimates, the cumulative hazard function estimates, and the cumulative incidence function estimates. If thelistcontains time points that are greater than the largest observed time, the requested prediction statistics might not be defined for these time points and are not output. 

 

baseline out = baseline survival = survival xbeta = xbeta covariates = val cumhaz = cumhaz / method = pl timelist = 30 60;

@mayankdce1 wrote:

While the running the proc phreg on counting data framework, I am using baseline statement as shown below:

 

proc phreg data = mod plots(overlay cl) = (survival cumhaz) outest = est namelen = 50;
model (start, stop) * depvar (0) = &model_varlist. / itprint rl = both ties = breslow selection = stepwise sle = 0.2 sls = 0.05;
baseline out = baseline survival = survival xbeta = xbeta covariates = val cumhaz = cumhaz / method = pl;
run;

 

However, my covariates dataset have close to 3 million records. Baseline data creates 20 observations for each of these 3 million observations as my modeling data experienced events at 20 time points. With 60 million observation data, my disk space and memory throws error.

 

I am not interested in all 20 survival probabilities. Is there a way I can stop the baseline to create 20 probabilities. Only 1 or 2 survival probabilities at t = 30 and 60 days will suffice. This way my data set will only be of 6 million observations.

 

Any immediate help is highly appreciated.


 

 

mayankdce1
Fluorite | Level 6

Timelist option works only for Bayesian analysis. Here's the note I am getting:

 

NOTE: The TIMELIST= option is ignored for a non-Bayesian analysis.

Reeza
Super User
Can you post some demo code so we can test please?
Reeza
Super User

I tried it with SAS STAT 14.3 and 15.1 and it seems fine, at least no errors about bayesian needed. Perhaps you have an older version of SAS/STAT?

 

proc format;
   value DiseaseGroup 1='ALL'
                      2='AML-Low Risk'
                      3='AML-High Risk';

data Bmt;
   input Disease T Status @@;
   label T='Disease-Free Survival in Days';
   format Disease DiseaseGroup.;
   datalines;
1   2081   0   1   1602   0   1   1496   0   1   1462   0   1   1433   0
1   1377   0   1   1330   0   1    996   0   1    226   0   1   1199   0
1   1111   0   1    530   0   1   1182   0   1   1167   0   1    418   2
1    383   1   1    276   2   1    104   1   1    609   1   1    172   2
1    487   2   1    662   1   1    194   2   1    230   1   1    526   2
1    122   2   1    129   1   1     74   1   1    122   1   1     86   2
1    466   2   1    192   1   1    109   1   1     55   1   1      1   2
1    107   2   1    110   1   1    332   2   2   2569   0   2   2506   0
2   2409   0   2   2218   0   2   1857   0   2   1829   0   2   1562   0
2   1470   0   2   1363   0   2   1030   0   2    860   0   2   1258   0
2   2246   0   2   1870   0   2   1799   0   2   1709   0   2   1674   0
2   1568   0   2   1527   0   2   1324   0   2    957   0   2    932   0
2    847   0   2    848   0   2   1850   0   2   1843   0   2   1535   0
2   1447   0   2   1384   0   2    414   2   2   2204   2   2   1063   2
2    481   2   2    105   2   2    641   2   2    390   2   2    288   2
2    421   1   2     79   2   2    748   1   2    486   1   2     48   2
2    272   1   2   1074   2   2    381   1   2     10   2   2     53   2
2     80   2   2     35   2   2    248   1   2    704   2   2    211   1
2    219   1   2    606   1   3   2640   0   3   2430   0   3   2252   0
3   2140   0   3   2133   0   3   1238   0   3   1631   0   3   2024   0
3   1345   0   3   1136   0   3    845   0   3    422   1   3    162   2
3     84   1   3    100   1   3      2   2   3     47   1   3    242   1
3    456   1   3    268   1   3    318   2   3     32   1   3    467   1
3     47   1   3    390   1   3    183   2   3    105   2   3    115   1
3    164   2   3     93   1   3    120   1   3     80   2   3    677   2
3     64   1   3    168   2   3     74   2   3     16   2   3    157   1
3    625   1   3     48   1   3    273   1   3     63   2   3     76   1
3    113   1   3    363   2
;

data Risk;
   Disease=1; output;
   Disease=2; output;
   Disease=3; output;
   format Disease DiseaseGroup.;
   run;

ods graphics on;
proc phreg data=Bmt plots(overlay=stratum)=cif;
   class Disease (order=internal ref=first);
   model T*Status(0)=Disease / eventcode=1;
   Hazardratio 'Pairwise' Disease / diff=pairwise;
   baseline covariates=Risk out=out1(where=(t in (100, 1000))) cif=_all_ timelist=100 1000/ seed=191 ;
run;

 

You can check your version of SAS/STAT if needed:

 

proc product_status;run;

It's possible some other option in your analysis is causing this issue.

Reeza
Super User

You can filter it out on the OUT statement, but not sure that will help with the memory issues:

 

   baseline covariates=Risk out=out1(where=(t in (100, 1000))) cif=_all_ / seed=191 ;

@mayankdce1 wrote:

Timelist option works only for Bayesian analysis. Here's the note I am getting:

 

NOTE: The TIMELIST= option is ignored for a non-Bayesian analysis.


 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1715 views
  • 2 likes
  • 2 in conversation