I need help improving the execution speed of a numerical optimization program (see below) I've written in Base SAS. My program works fine - it successfully calculates parameters for a penalized logistic regression using a coordinate descent routine. The program is based on the algorithm used in the glmnet package in R, which is written in Fortran. Running glmnet in R is lightning fast, however my code takes significantly longer. With the help of others I've streamlined my program considerably - it reads my data into a two-dimensional temporary array and processes the array in a single data step before reading out the final coefficient estimates to an output dataset. So I/O processing ought to be minimized. And my calculations should be conducted in memory, which ought to minimize execution time. For trial runs I'm running the program on the publically available diabetes dataset (http://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt) after standardizing the 10 predictor variables x1-x10, adding an intercept constant x0=1, and creating a binary response variable y_gt140=1 if y>140, else y_gt140=0. Running the code below takes ~5 minutes in SAS, compared with < 1 second to run glmnet in R. What else can I do to improve the execution time of my program? Thanks in advance! ********************************************************************** * Logistic regression coordinate descent code for elastic net * **********************************************************************; %let nobs=442; %let numvars=10; %let numvars2=11; %let numiter=100000; %let lambda_list=.1; %let alpha_list=.5; %macro parmlist; %do i = 0 %to &numvars; &&p&i %end; %mend parmlist; %macro meanlist; %do i = 0 %to &numvars; &&mean&i %end; %mend meanlist; %macro stdlist; %do i = 0 %to &numvars; &&std&i %end; %mend stdlist; data coord_descent_output (keep=alpha lambda parm_unstnd_0-parm_unstnd_&numvars2.); array xx[0:&numvars.] x0-x&numvars.; array x_[&nobs.,0:&numvars.] _temporary_; array y_[&nobs] _temporary_; array mean_[0:&numvars.] (%meanlist); array std_[0:&numvars.] (%stdlist); array parm_unstnd_[0:&numvars2.] (&numvars2.*0); *** Load data into two dimensional array ***; do _n_ = 1 to &nobs.; set work.diabetes_stnd_array2 nobs=nobs; do j=0 to &numvars.; x_[_n_,j] = xx ; y_[_n_] = y_gt140 ; end; end; do alpha=&alpha_list; do lambda=&lambda_list; gamma=alpha*lambda; array p_[0:&numvars.] (&numvars2.*1); do i=1 to &numiter; do j=0 to &numvars; z = 0; sum_wtx_sq = 0; do _nn_ = 1 to &nobs; yhat = p_[0]; do k=1 to &numvars; yhat = sum(yhat, p_ *x_[_nn_,k]); end; proby = 1/(1 + exp(-yhat)); if proby <= .00001 then do; proby = 0; weight = .00001; end; else if proby >= .99999 then do; proby = 1; weight = .00001; end; else weight = proby*(1 - proby); z = sum(z, (x_[_nn_,j]*(y_[_nn_] - proby) + weight*p_ *(x_[_nn_,j])** 2)); sum_wtx_sq = sum(sum_wtx_sq, weight*(x_[_nn_,j])**2); end; if j=0 then do; p_ = z/sum_wtx_sq; end; else if j>0 then do; if (z/&nobs > 0 and gamma < abs(z/&nobs)) then p_ = (z/&nobs - gamma)/(sum_wtx_sq/&nobs + lambda - gamma); else if (z/&nobs < 0 and gamma < abs(z/&nobs)) then p_ = (z/&nobs + gamma)/(sum_wtx_sq/&nobs + lambda - gamma); else if gamma >= abs(z/&nobs) then p_ = 0; end; end; * Inner loop ; end; * Outer loop ; *** Calculate "destandardized" regression coeff.'s from standardized predictor variables (Mean(x)= 0, Var(x)=1). ******; parm_unstnd_[0] = p_[0]; do l=1 to &numvars.; parm_unstnd_[0] = parm_unstnd_[0] - p_ *mean_ /std_ ; parm_unstnd_ = p_ /std_ ; end; output coord_descent_output; put "Final parm_unstnd_ =" parm_unstnd_ ; end; end; run;
... View more