Hi @Rick_SAS and SAS experts :
I'm conducting a simulation study to test the effect of the extent of missing in a time variable on the survival estimates over the span of 0-100% of data. SAS program has been solved in the post below
The whole SAS program posted below takes about 5 min on the mock data, a size of N=143.
However, my actual data sizes are 25K and 150K. And I ran the code on the 25K size data for a test and it took 36 hours when I needed to terminate the procedure.
If this is the case for the smallest data I have then would it be almost impractical for 150K row data.
Is there any trick to cut a run time on this simulation program? or alternative approach to achieve the same result quicker. I'm especially concerned about the largest data 150K.
Please let me know if anything is not clear here. I greatly appreciate your time and help.
DATA HAVE1;
INPUT ID status date_of_death_mmm date_of_death_dd date_of_death_yyyy date_of_diagnosis_mm date_of_diagnosis_dd date_of_diagnosis_yyyy agegrp dx exit stage DEATH date_of_diagnosis_mm1 date_of_diagnosis_dd1 DX_MID_YR duration_no_missing duration_missing;
cards;
1 0 8 19 2005 3 22 2005 3 16517 16667 3 1 7 1 16618 150 49
2 0 6 19 2010 7 20 2005 2 16637 18432 4 1 7 1 16618 1795 1814
3 0 1 7 2006 8 29 2005 4 16677 16808 4 1 7 1 16618 131 190
4 0 5 24 2006 8 30 2005 3 16678 16945 4 1 7 1 16618 267 327
5 0 6 11 2007 11 7 2005 3 16747 17328 4 1 7 1 16618 581 710
6 0 12 27 2005 11 9 2005 4 16749 16797 4 1 7 1 16618 48 179
7 0 3 12 2006 11 25 2005 2 16765 16872 4 1 7 1 16618 107 254
8 0 11 3 2005 7 31 2005 3 16648 16743 4 1 7 1 16618 95 125
9 0 7 20 2005 3 9 2005 4 16504 16637 99 1 7 1 16618 133 19
10 0 12 21 2006 1 9 2006 3 16810 17156 4 1 7 1 16983 346 173
11 0 5 14 2006 2 25 2005 3 16492 16935 99 1 7 1 16618 443 317
12 0 9 8 2005 4 5 2005 4 16531 16687 99 1 7 1 16618 156 69
13 0 12 8 2005 9 12 2005 4 16691 16778 99 1 7 1 16618 87 160
14 0 4 14 2007 1 18 2006 2 16819 17270 4 1 7 1 16983 451 287
15 0 10 30 2009 2 8 2006 2 16840 18200 2 1 7 1 16983 1360 1217
16 0 3 2 2007 6 27 2006 1 16979 17227 3 1 7 1 16983 248 244
17 0 1 13 2007 10 23 2006 4 17097 17179 4 1 7 1 16983 82 196
18 0 12 18 2006 9 28 2006 3 17072 17153 4 1 7 1 16983 81 170
19 0 7 15 2007 12 15 2006 3 17150 17362 4 1 7 1 16983 212 379
20 0 7 11 2006 5 11 2006 4 16932 16993 4 1 7 1 16983 61 10
21 0 2 12 2008 7 24 2006 4 17006 17574 3 0 7 1 16983 568 591
22 0 1 18 2007 11 3 2006 4 17108 17184 4 0 7 1 16983 76 201
23 0 7 20 2010 6 5 2006 2 16957 18463 2 1 7 1 16983 1506 1480
24 0 11 12 2006 9 21 2006 1 17065 17117 4 1 7 1 16983 52 134
25 0 7 2 2007 1 22 2007 2 17188 17349 3 1 7 1 17348 161 1
26 0 3 16 2008 2 2 2007 3 17199 17607 4 1 7 1 17348 408 259
27 0 5 13 2009 3 12 2007 1 17237 18030 4 1 7 1 17348 793 682
28 0 8 28 2007 3 16 2007 4 17241 17406 3 1 7 1 17348 165 58
29 0 2 16 2007 9 26 2006 4 17070 17213 4 1 7 1 16983 143 230
30 0 10 13 2008 5 14 2007 4 17300 17818 3 1 7 1 17348 518 470
31 0 12 22 2007 8 13 2007 3 17391 17522 4 1 7 1 17348 131 174
32 0 4 13 2008 9 7 2007 3 17416 17635 4 1 7 1 17348 219 287
33 0 12 30 2007 7 9 2007 3 17356 17530 3 0 7 1 17348 174 182
34 0 12 1 2006 7 14 2006 4 16996 17136 99 1 7 1 16983 140 153
35 0 2 7 2010 9 15 2007 3 17424 18300 3 1 7 1 17348 876 952
36 0 4 29 2008 6 29 2007 3 17346 17651 3 1 7 1 17348 305 303
37 0 12 4 2007 12 1 2007 3 17501 17504 4 1 7 1 17348 3 156
38 0 9 18 2007 7 16 2007 3 17363 17427 4 1 7 1 17348 64 79
39 0 12 2 2007 10 18 2007 3 17457 17502 3 1 7 1 17348 45 154
40 0 6 23 2008 12 3 2007 4 17503 17706 3 1 7 1 17348 203 358
41 0 8 23 2008 2 4 2008 2 17566 17767 4 1 7 1 17714 201 53
42 0 6 6 2009 4 14 2008 3 17636 18054 4 1 7 1 17714 418 340
43 0 8 21 2011 5 16 2008 3 17668 18860 3 1 7 1 17714 1192 1146
44 0 1 14 2009 9 24 2008 4 17799 17911 4 1 7 1 17714 112 197
45 0 7 9 2009 4 22 2009 4 18009 18087 3 1 7 1 18079 78 8
46 0 9 28 2010 3 17 2009 2 17973 18533 2 0 7 1 18079 560 454
47 0 10 6 2009 4 30 2009 3 18017 18176 4 0 7 1 18079 159 97
48 0 7 10 2009 5 29 2009 4 18046 18088 3 0 7 1 18079 42 9
49 0 8 9 2010 6 4 2009 3 18052 18483 3 1 7 1 18079 431 404
50 0 11 25 2009 7 10 2009 3 18088 18226 4 1 7 1 18079 138 147
51 0 9 16 2009 5 22 2009 4 18039 18156 3 0 7 1 18079 117 77
52 0 4 28 2010 8 21 2009 3 18130 18380 4 1 7 1 18079 250 301
53 0 1 30 2010 9 15 2009 3 18155 18292 2 1 7 1 18079 137 213
54 0 1 21 2010 9 2 2009 3 18142 18283 3 1 7 1 18079 141 204
55 0 10 18 2010 10 9 2009 3 18179 18553 3 0 7 1 18079 374 474
56 0 2 7 2010 10 24 2009 3 18194 18300 4 1 7 1 18079 106 221
57 0 1 26 2010 10 6 2009 3 18176 18288 4 1 7 1 18079 112 209
58 0 2 24 2010 11 19 2009 2 18220 18317 4 1 7 1 18079 97 238
59 0 12 28 2012 11 17 2009 3 18218 19355 4 1 7 1 18079 1137 1276
60 0 6 14 2010 12 2 2009 3 18233 18427 4 1 7 1 18079 194 348
61 0 3 16 2012 11 16 2009 4 18217 19068 3 1 7 1 18079 851 989
62 0 3 20 2010 7 30 2009 3 18108 18341 3 1 7 1 18079 233 262
63 0 5 10 2011 2 23 2010 2 18316 18757 4 1 7 1 18444 441 313
64 0 6 8 2012 3 12 2010 2 18333 19152 4 1 7 1 18444 819 708
65 0 7 10 2010 5 6 2010 4 18388 18453 3 1 7 1 18444 65 9
66 0 7 11 2010 5 25 2010 3 18407 18454 3 1 7 1 18444 47 10
67 0 3 7 2011 7 30 2010 3 18473 18693 4 1 7 1 18444 220 249
68 0 2 7 2012 7 18 2010 2 18461 19030 4 1 7 1 18444 569 586
69 0 12 18 2010 9 3 2010 3 18508 18614 2 0 7 1 18444 106 170
70 0 11 24 2013 11 9 2010 4 18575 19686 3 1 7 1 18444 1111 1242
71 0 4 30 2012 11 12 2010 3 18578 19113 3 1 7 1 18444 535 669
72 0 2 21 2012 11 12 2010 3 18578 19044 3 1 7 1 18444 466 600
73 0 6 12 2011 12 14 2010 3 18610 18790 4 1 7 1 18444 180 346
74 0 7 17 2011 12 20 2010 4 18616 18825 3 1 7 1 18444 209 381
75 0 7 4 2013 11 3 2010 3 18569 19543 3 1 7 1 18444 974 1099
76 0 12 22 2012 10 22 2010 3 18557 19349 3 1 7 1 18444 792 905
77 0 11 28 2013 1 25 2011 3 18652 19690 4 1 7 1 18809 1038 881
78 0 12 27 2011 6 3 2011 3 18781 18988 3 0 7 1 18809 207 179
79 0 12 5 2011 5 2 2011 4 18749 18966 3 1 7 1 18809 217 157
80 0 7 21 2011 5 23 2011 3 18770 18829 4 1 7 1 18809 59 20
81 0 2 27 2014 7 13 2011 3 18821 19781 3 0 7 1 18809 960 972
82 0 12 22 2011 10 12 2011 4 18912 18983 4 1 7 1 18809 71 174
83 0 2 1 2012 9 10 2011 4 18880 19024 4 1 7 1 18809 144 215
84 0 7 14 2012 9 6 2011 4 18876 19188 3 1 7 1 18809 312 379
85 0 12 2 2012 12 9 2011 4 18970 19329 3 1 7 1 18809 359 520
86 0 10 29 2012 12 5 2011 2 18966 19295 4 1 7 1 18809 329 486
87 0 7 10 2015 11 17 2011 1 18948 20279 4 1 7 1 18809 1331 1470
88 0 3 19 2012 12 28 2011 2 18989 19071 4 1 7 1 18809 82 262
89 0 11 16 2012 1 26 2012 3 19018 19313 2 0 7 1 19175 295 138
90 0 2 8 2014 1 5 2012 4 18997 19762 3 1 7 1 19175 765 587
91 0 8 27 2013 2 21 2012 3 19044 19597 4 1 7 1 19175 553 422
92 0 5 2 2013 2 20 2012 3 19043 19480 3 1 7 1 19175 437 305
93 0 2 12 2013 4 13 2012 4 19096 19401 3 1 7 1 19175 305 226
94 0 7 16 2013 4 16 2012 3 19099 19555 2 1 7 1 19175 456 380
95 0 3 4 2013 5 10 2012 3 19123 19421 4 1 7 1 19175 298 246
96 0 7 23 2012 5 14 2012 3 19127 19197 3 1 7 1 19175 70 22
97 0 7 6 2012 5 31 2012 3 19144 19180 4 1 7 1 19175 36 5
98 0 12 22 2012 6 9 2012 2 19153 19349 3 1 7 1 19175 196 174
99 0 6 19 2013 6 28 2012 3 19172 19528 3 1 7 1 19175 356 353
100 0 6 5 2013 8 14 2012 4 19219 19514 3 1 7 1 19175 295 339
101 0 11 5 2012 10 24 2012 4 19290 19302 3 1 7 1 19175 12 127
102 0 8 1 2013 11 9 2012 2 19306 19571 4 1 7 1 19175 265 396
103 0 8 14 2013 11 30 2012 4 19327 19584 4 1 7 1 19175 257 409
104 0 8 3 2012 4 17 2012 3 19100 19208 4 1 7 1 19175 108 33
105 0 3 19 2013 11 5 2012 4 19302 19436 4 1 7 1 19175 134 261
106 0 7 31 2014 11 26 2012 4 19323 19935 4 1 7 1 19175 612 760
107 0 8 18 2014 12 21 2012 3 19348 19953 4 1 7 1 19175 605 778
108 0 11 22 2013 1 10 2013 3 19368 19684 4 1 7 1 19540 316 144
109 0 11 3 2014 1 17 2013 4 19375 20030 3 1 7 1 19540 655 490
110 0 9 14 2013 1 23 2013 4 19381 19615 3 1 7 1 19540 234 75
111 0 12 6 2013 1 30 2013 2 19388 19698 3 1 7 1 19540 310 158
112 0 2 17 2014 3 8 2013 1 19425 19771 4 1 7 1 19540 346 231
113 0 3 4 2014 7 18 2013 2 19557 19786 4 1 7 1 19540 229 246
114 0 8 2 2015 6 11 2013 3 19520 20302 3 1 7 1 19540 782 762
115 0 3 22 2014 10 27 2009 3 18197 19804 2 1 7 1 18079 1607 1725
116 0 11 15 2013 9 26 2013 4 19627 19677 99 1 7 1 19540 50 137
117 0 5 29 2014 11 13 2013 2 19675 19872 4 1 7 1 19540 197 332
118 0 11 6 2014 9 5 2013 3 19606 20033 3 1 7 1 19540 427 493
119 0 7 11 2012 5 10 2012 4 19123 19185 4 1 7 1 19175 62 10
120 0 9 28 2016 12 4 2012 4 19331 20725 3 1 7 1 19175 1394 1550
121 0 8 31 2014 12 21 2013 3 19713 19966 4 1 7 1 19540 253 426
122 0 5 5 2016 1 17 2014 2 19740 20579 4 1 7 1 19905 839 674
123 0 8 1 2015 3 3 2014 2 19785 20301 3 1 7 1 19905 516 396
124 0 11 29 2014 5 16 2014 4 19859 20056 3 1 7 1 19905 197 151
125 0 6 11 2015 7 2 2014 2 19906 20250 3 1 7 1 19905 344 345
126 0 7 18 2015 9 24 2014 3 19990 20287 4 1 7 1 19905 297 382
127 0 7 24 2016 10 30 2014 3 20026 20659 4 1 7 1 19905 633 754
128 0 7 5 2015 12 9 2014 3 20066 20274 4 1 7 1 19905 208 369
129 0 11 19 2015 2 3 2015 1 20122 20411 3 1 7 1 20270 289 141
130 0 10 31 2015 1 8 2015 2 20096 20392 4 1 7 1 20270 296 122
131 0 3 15 2016 3 26 2015 2 20173 20528 4 1 7 1 20270 355 258
132 0 7 4 2015 3 10 2015 3 20157 20273 3 1 7 1 20270 116 3
133 0 8 9 2015 3 17 2015 2 20164 20309 4 1 7 1 20270 145 39
134 0 12 8 2016 5 22 2015 3 20230 20796 2 1 7 1 20270 566 526
135 0 8 15 2016 5 27 2015 4 20235 20681 4 1 7 1 20270 446 411
136 0 7 11 2015 5 8 2015 3 20216 20280 4 1 7 1 20270 64 10
137 0 10 19 2016 7 1 2015 4 20270 20746 3 0 7 1 20270 476 476
138 0 10 28 2015 9 1 2015 3 20332 20389 4 1 7 1 20270 57 119
139 0 2 14 2016 8 6 2015 3 20306 20498 4 1 7 1 20270 192 228
140 0 12 21 2016 11 17 2015 3 20409 20809 4 1 7 1 20270 400 539
141 0 9 12 2016 10 16 2015 3 20377 20709 3 1 7 1 20270 332 439
142 0 4 26 2016 3 2 2015 4 20149 20570 99 0 7 1 20270 421 300
143 0 7 28 2016 8 25 2015 2 20325 20663 3 1 7 1 20270 338 393
;
RUN;
*sort for demo;
proc sort data=have1;
by stage;
run;
%macro surveyLoop(sample=);
*calculate sample rate;
%let sampRate = %sysevalf(&sample./100);
*create sample;
ods select none;
proc surveyselect data=have1
samprate=&sampRate.
reps=100
out=_sample
seed=995
outall;
strata stage;*ensures stage is present for all values - not needed in prod;
run;
*set duration to values if selected;
data _sample;
set _sample;
*assign values as needed;
if selected = 1 then duration = duration_missing;
else duration=duration_no_missing;
*sort for modeling procedures next;
proc sort data=_sample;
by replicate;
run;
/*SURVIVAL ANALYSIS ON THE ACTUAL DATA WITH NO MISSING*/
ods output HazardRatios=PE_FULL;
PROC PHREG DATA =_sample;
by replicate;
CLASS STAGE(REF='99') AGEGRP(REF='1')/PARAM=REF;
MODEL duration_no_missing*DEATH(0) = AGEGRP STAGE/RL EVENTCODE=1;
hazardratio STAGE/diff=REF at (AGEGRP=all);
RUN;
ods output HazardRatios=PE_MISSING;
/*simulate duration_missing for the 0-100% of the data*/
PROC PHREG DATA =_sample;
by replicate;
CLASS STAGE(REF='99') AGEGRP(REF='1')/PARAM=REF;
MODEL duration*DEATH(0) = AGEGRP STAGE/RL EVENTCODE=1;
hazardratio STAGE/diff=REF at (AGEGRP=all);
RUN;
*combines results and labels estimates;
DATA PE_COMBO; SET PE_MISSING (in=t1) PE_FULL (in=t2);
length estType $15.;
PCT_MISS = &sample/100;
if t1 then estType = 'Imputed Data';
else estType = 'Full Data';
run;
*omit if no replicates;
*if replicates calculates average of HR + STDERR for bootstrap approach;
proc means data=PE_COMBO NWAY N MEAN STD STDERR;
class estType Description pct_miss;
var hazardratio;
ods output summary = pe_summary;
run;
ods select all;
*append to main data sets;
proc append base=results_combined data=pe_combo force;
run;
proc append base=results_summary data=pe_summary force;
run;
*clean up after;
proc datasets lib=work nodetails nolist;
delete _sample;
run;quit;
%let sampRate =;
%mend;
OPTIONS NOLABEL;
*call macro for different sizes;
%surveyLoop(sample=1);
%surveyLoop(sample=2);
%surveyLoop(sample=3);
%surveyLoop(sample=4);
%surveyLoop(sample=5);
%surveyLoop(sample=6);
%surveyLoop(sample=7);
%surveyLoop(sample=8);
%surveyLoop(sample=9);
%surveyLoop(sample=10);
%surveyLoop(sample=11);
%surveyLoop(sample=12);
%surveyLoop(sample=13);
%surveyLoop(sample=14);
%surveyLoop(sample=15);
%surveyLoop(sample=16);
%surveyLoop(sample=17);
%surveyLoop(sample=18);
%surveyLoop(sample=19);
%surveyLoop(sample=20);
%surveyLoop(sample=21);
%surveyLoop(sample=22);
%surveyLoop(sample=23);
%surveyLoop(sample=24);
%surveyLoop(sample=25);
%surveyLoop(sample=26);
%surveyLoop(sample=27);
%surveyLoop(sample=28);
%surveyLoop(sample=29);
%surveyLoop(sample=30);
%surveyLoop(sample=31);
%surveyLoop(sample=32);
%surveyLoop(sample=33);
%surveyLoop(sample=34);
%surveyLoop(sample=35);
%surveyLoop(sample=36);
%surveyLoop(sample=37);
%surveyLoop(sample=38);
%surveyLoop(sample=39);
%surveyLoop(sample=40);
%surveyLoop(sample=41);
%surveyLoop(sample=42);
%surveyLoop(sample=43);
%surveyLoop(sample=44);
%surveyLoop(sample=45);
%surveyLoop(sample=46);
%surveyLoop(sample=47);
%surveyLoop(sample=48);
%surveyLoop(sample=49);
%surveyLoop(sample=50);
%surveyLoop(sample=51);
%surveyLoop(sample=52);
%surveyLoop(sample=53);
%surveyLoop(sample=54);
%surveyLoop(sample=55);
%surveyLoop(sample=56);
%surveyLoop(sample=57);
%surveyLoop(sample=58);
%surveyLoop(sample=59);
%surveyLoop(sample=60);
%surveyLoop(sample=61);
%surveyLoop(sample=62);
%surveyLoop(sample=63);
%surveyLoop(sample=64);
%surveyLoop(sample=65);
%surveyLoop(sample=66);
%surveyLoop(sample=67);
%surveyLoop(sample=68);
%surveyLoop(sample=69);
%surveyLoop(sample=70);
%surveyLoop(sample=71);
%surveyLoop(sample=72);
%surveyLoop(sample=73);
%surveyLoop(sample=74);
%surveyLoop(sample=75);
%surveyLoop(sample=76);
%surveyLoop(sample=77);
%surveyLoop(sample=78);
%surveyLoop(sample=79);
%surveyLoop(sample=80);
%surveyLoop(sample=81);
%surveyLoop(sample=82);
%surveyLoop(sample=83);
%surveyLoop(sample=84);
%surveyLoop(sample=85);
%surveyLoop(sample=86);
%surveyLoop(sample=87);
%surveyLoop(sample=88);
%surveyLoop(sample=89);
%surveyLoop(sample=90);
%surveyLoop(sample=91);
%surveyLoop(sample=92);
%surveyLoop(sample=93);
%surveyLoop(sample=94);
%surveyLoop(sample=95);
%surveyLoop(sample=96);
%surveyLoop(sample=97);
%surveyLoop(sample=98);
%surveyLoop(sample=99);
%surveyLoop(sample=100);
DATA FULL(RENAME=(HazardRatio_Mean=HR_FULL estType=FULL)) IMPUTED(RENAME=(HazardRatio_Mean=HR_IMP estType=IMPUTED));
SET Results_summary(KEEP=HazardRatio_Mean PCT_MISS Description estType);
IF estType IN ('Full Data') THEN OUTPUT FULL;
IF estType IN ('Imputed Data') THEN OUTPUT IMPUTED;
run;
PROC SORT DATA=FULL; /*N=1200*/
BY PCT_MISS Description;
PROC SORT DATA=IMPUTED; /*N=1200*/
BY PCT_MISS Description;
DATA MERGED_HAZARD; /*N=1200*/
MERGE FULL
IMPUTED;
BY PCT_MISS Description;
RUN;
DATA MERGED_HAZARD1; /*N=1200*/ SET MERGED_HAZARD;
ASD=HR_FULL-HR_IMP;
RSD=(HR_FULL-HR_IMP)/HR_FULL;
RUN;
PROC FORMAT;
VALUE SORT
1='Localized vs Ref among 0-44yr'
2='Localized vs Ref among 45-59yr'
3='Localized among 60-74yr'
4='Localized vs Ref among 75+yr'
5='Regional vs Ref among 0-44yr'
6='Regional among 45-59yr'
7='Regional vs Ref among 60-74yr'
8='Regional vs Ref among 75+yr'
9='Distant vs Ref among 0-44yr'
10='Distant vs Ref among 45-59yr'
11='Distant vs Ref among 60-74yr'
12='Distant vs Ref among 75+yr'
;
RUN;
proc sort data=FIG3_SIM;
by sort;
run;
DATA FIG3_SIM; /*N=1200*/ SET MERGED_HAZARD1;
length group $ 30;
PCT_MISS100=PCT_MISS*100;
if Description='stage 2 vs 99 At agegrp=1' then SORT=1; ELSE
if Description='stage 2 vs 99 At agegrp=2' then SORT=2; ELSE
if Description='stage 2 vs 99 At agegrp=3' then SORT=3; ELSE
if Description='stage 2 vs 99 At agegrp=4' then SORT=4; ELSE
if Description='stage 3 vs 99 At agegrp=1' then SORT=5; ELSE
if Description='stage 3 vs 99 At agegrp=2' then SORT=6; ELSE
if Description='stage 3 vs 99 At agegrp=3' then SORT=7; ELSE
if Description='stage 3 vs 99 At agegrp=4' then SORT=8; ELSE
if Description='stage 4 vs 99 At agegrp=1' then SORT=9; ELSE
if Description='stage 4 vs 99 At agegrp=2' then SORT=10; ELSE
if Description='stage 4 vs 99 At agegrp=3' then SORT=11; ELSE
if Description='stage 4 vs 99 At agegrp=4' then SORT=12;
RUN;
ods graphics/width=12 in height=6in;
proc sgpanel data=FIG3_SIM noautolegend;
label PCT_MISS100='Percent of Incomplete Data' RSD='RELATIVE SURVIVAL DIFFERENCE' ASD='ABS SURIVAL DIFFERENCE';
panelby SORT /novarname columns=4 rows=3 ;
styleattrs DATACONTRASTCOLORS=(black red);
series x=PCT_MISS100 y=RSD /lineattrs=(pattern=solid);
series x=PCT_MISS100 y=ASD /lineattrs=(pattern=solid);
keylegend/ title="SURVIVAL DIFFERENCE" position=bottom;
colaxis label='percent of missing in data' grid;
rowaxis label='RELATIVE AND ABSOLUTE SURVIVAL DIFFERENCE' grid;
title "Maximum Effect of Missing Imputation on the Survival Analysis";
FORMAT SORT SORT.;
run;
Proc Phreg took the longest.
NOTE: The data set WORK._SAMPLE has 14300 observations and 22 variables.
NOTE: PROCEDURE SURVEYSELECT used (Total process time):
real time 0.28 seconds
cpu time 0.01 seconds
The data set WORK.PE_FULL has 1200 observations and 5 variables.
NOTE: PROCEDURE PHREG used (Total process time):
real time 2.33 seconds
Read "8 tips to make your simulation run faster."
I think tips 4 through 8 will all benefit you.
Also,
9. DROP variables that you aren't using so that those records don't need to be read and written millions of times.
10. Your samples are not independent because you are using the same SEED= option for all calls to PROC SURVEYSELECT. This will affect the inferences, so you need to fix that.
A minor bit, you don't need to do this:
%let sampRate = %sysevalf(&sample./100);
because from the documentation on surveyselect SAMPRATE option:
The sampling rate value must be a positive number. The stratum sampling rate values and the stratum sampling rates that you provide in the SAS-data-set must be nonnegative numbers. You can specify a sampling rate as a number between 0 and 1. Or you can specify a rate in percentage form as a number between 1 and 100, which PROC SURVEYSELECT converts to a proportion.
I would be tempted to add ODS SELECT so only the tables of interest are output. I would not be surprised that part of the issue is the amount of time formatting output to the results window. If you can get the desired output in an OUTEST data set instead of ODS OUTPUT try using NOPRINT with the proc phreg as well. A lot of time can be used by output to results windows.
I was gonna use ODS SELECT NONE; instead NOPRINT based on:
Ballardw do you know how to make seed=995 dynamic so seed is different for each iteration as Rick commented. I've been looking for a solution but not successful yet.
How intensive is it for SAS to produce a log? Does following help in efficiency at all? Proc Phreg output 100 lines of convergence criteria status in the log for each iteration.
proc printto log='Z:\log.txt'; run;
It's not intensive, but that's not the best way to monitor convergence. Read "Monitor convergence during simulation studies in SAS"
and use the ConvergenceStatus table as shown in the second example.
Regarding suppressing ODS output, please read the article "Turn off ODS when running simulations in SAS."
It defines little macros called %ODSOFF and %ODS on that you can use to make sure SAS isn't generating output needlessly. I also like to use PLOTS=NONE to prevent procedures from generating graphs needlessly, although I don't think PHREG produces any graphs unless you explicitly ask for them.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.