Thanks stat_sas and Rick_SAS for your suggestions. To try them, I downsized the data to a 5GB dataset with 250k obs and still 2500 vars. Now, executing the previous proc reg code without the output statement takes 18 seconds, while it takes aprox 2 minutes and 5 seconds with the output statement (requesting both rstudent and h), regardless of whether the out= dataset has or not the same name as the input data= dataset. So even at this data size there is a significant increase in the computational time required by the output statement. To further explore whether this increase is caused by the computation of studentized residuals and leverages or by the I/O, I tried different output statements: 1) output out=training1 h=h; Took aprox 2 minutes 2) output out=training1 rstudent=rstud; Also took 2 minutes 3) output out=training1 (keep=rstud h) rstudent=rstud h=h; Took 1 minute and 50 seconds 4) output out=training1; Took 45 seconds Using the INFLUENCE option on the code without the OUTPUT statement made no difference since the NOPRINT option was on. But I think the comparison would not be fair removing NOPRINT as it then would take a long time writing influence statistics to the results window (250k lines of 2500 dfbetas plus the other statistics...). Anyway, the comparison between the output statements 1-4 above seems to point to the computation of the leverages and residuals taking the biggest part of the time - it took just 25-30 seconds more (wrt the code without output statement) to produce the whole out= dataset (with the 2500 vars), but aprox 1 minute and a half more to compute the h and rstudent options and write them down to the out= dataset (even just these 2 columns). It looks like it is the computation of the H matrix diagonal (which is not actually needed in the stepwise selection process but is necessary for studentized residuals and leverages) what makes the difference in elapsed time. I will appreciate any comments on these findings as well as further ideas to obtain studentized residuals in a shorter time (perhaps proc IML...?) Thanks again!
... View more