Solved: Re: How to improve execution speed of program in Base SAS?

RobF · Posted 04-23-2015 11:23 AM

I need help improving the execution speed of a numerical optimization program (see below) I've written in Base SAS.

My program works fine - it successfully calculates parameters for a penalized logistic regression using a coordinate descent routine. The program is based on the algorithm used in the glmnet package in R, which is written in Fortran. Running glmnet in R is lightning fast, however my code takes significantly longer.

With the help of others I've streamlined my program considerably - it reads my data into a two-dimensional temporary array and processes the array in a single data step before reading out the final coefficient estimates to an output dataset. So I/O processing ought to be minimized. And my calculations should be conducted in memory, which ought to minimize execution time.

For trial runs I'm running the program on the publically available diabetes dataset (http://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt) after standardizing the 10 predictor variables x1-x10, adding an intercept constant x0=1, and creating a binary response variable y_gt140=1 if y>140, else y_gt140=0.

Running the code below takes ~5 minutes in SAS, compared with < 1 second to run glmnet in R.

What else can I do to improve the execution time of my program?

Thanks in advance!

**********************************************************************

* Logistic regression coordinate descent code for elastic net *

**********************************************************************;

%let nobs=442;

%let numvars=10;

%let numvars2=11;

%let numiter=100000;

%let lambda_list=.1;

%let alpha_list=.5;

%macro parmlist; %do i = 0 %to &numvars; &&p&i %end; %mend parmlist;

%macro meanlist; %do i = 0 %to &numvars; &&mean&i %end; %mend meanlist;

%macro stdlist; %do i = 0 %to &numvars; &&std&i %end; %mend stdlist;

data coord_descent_output (keep=alpha lambda parm_unstnd_0-parm_unstnd_&numvars2.);

array xx[0:&numvars.] x0-x&numvars.;

array x_[&nobs.,0:&numvars.] _temporary_;

array y_[&nobs] _temporary_;

array mean_[0:&numvars.] (%meanlist);

array std_[0:&numvars.] (%stdlist);

array parm_unstnd_[0:&numvars2.] (&numvars2.*0);

*** Load data into two dimensional array ***;

do _n_ = 1 to &nobs.;

set work.diabetes_stnd_array2 nobs=nobs;

do j=0 to &numvars.;

x_[_n_,j] = xx;

y_[_n_] = y_gt140 ;

end;

do alpha=&alpha_list;

do lambda=&lambda_list;

gamma=alpha*lambda;

array p_[0:&numvars.] (&numvars2.*1);

do i=1 to &numiter;

do j=0 to &numvars;

z = 0;

sum_wtx_sq = 0;

do _nn_ = 1 to &nobs;

yhat = p_[0];

do k=1 to &numvars;

yhat = sum(yhat, p_*x_[_nn_,k]);

end;

proby = 1/(1 + exp(-yhat));

if proby <= .00001 then do;

proby = 0;

weight = .00001;

end;

else if proby >= .99999 then do;

proby = 1;

weight = .00001;

end;

else weight = proby*(1 - proby);

z = sum(z, (x_[_nn_,j]*(y_[_nn_] - proby) + weight*p_*(x_[_nn_,j])**2));

sum_wtx_sq = sum(sum_wtx_sq, weight*(x_[_nn_,j])**2);

end;

if j=0 then do;

p_ = z/sum_wtx_sq;

end;

else if j>0 then do;

if (z/&nobs > 0 and gamma < abs(z/&nobs)) then p_ = (z/&nobs - gamma)/(sum_wtx_sq/&nobs + lambda - gamma);

else if (z/&nobs < 0 and gamma < abs(z/&nobs)) then p_ = (z/&nobs + gamma)/(sum_wtx_sq/&nobs + lambda - gamma);

else if gamma >= abs(z/&nobs) then p_ = 0;

end;

end; * Inner loop ;

end; * Outer loop ;

*** Calculate "destandardized" regression coeff.'s from standardized predictor variables (Mean(x)= 0, Var(x)=1). ******;

parm_unstnd_[0] = p_[0];

do l=1 to &numvars.;

parm_unstnd_[0] = parm_unstnd_[0] - p_*mean_/std_;

parm_unstnd_ = p_/std_;

end;

output coord_descent_output;

put "Final parm_unstnd_

="

parm_unstnd_

;

end;

run;

FriedEgg · Posted 04-24-2015 12:20 PM

Yes, but this is the main factor (I think) behind why your code is running considerably longer. Since R will most likely stop far before it reaches the maximum iterations.

I am not going to speak to the validity of the following as a stopping criteria, but checking the value of sum_wtx_sq against it's lag for no change is valid for at least this test.

I made a few minor edits to you code, below, but nothing that makes any real change to the algorithm (beyond adding the stopping criteria I mention above).

data coord_descent;

array x_[&nobs.,0:&numvars.] _temporary_;

array y_[&nobs.] _temporary_;

/* cast data set to arrays x[i,j], y*/

retain x0-x&numvars.;

_adr1=addrlong(x0);

do _n_=1 to &nobs.;

set work.diabetes_stnd_array2;

y_[_n_]=y_gt140;

call pokelong( peekclong(_adr1), addrlong(x_[_n_,0]), %eval(8*%eval(&numvars.+1)) );

end;

array p_[0:&numvars.];

_adr2=addrlong(p_[0]);

_ones=repeat(put(1, rb8.), &numvars.);

array parm_unstnd_[0:&numvars.];

_adr3=addrlong(parm_unstnd_[0]);

_zero=repeat(put(0, rb8.), &numvars.);

array mean_[0:&numvars.] _temporary_ (%meanlist);

array std_[0:&numvars.] _temporary_ (%stdlist);

do alpha=&alpha_list.;

do lambda=&lambda_list.;

gamma = alpha*lambda;

call pokelong( _ones, _adr2, %eval(8*%eval(&numvars.+1)) );

call pokelong( _zero, _adr3, %eval(8*%eval(&numvars.+1)) );

do i=1 to &numiter. until (lag(sum_wtx_sq) eq sum_wtx_sq);

do j=0 to &numvars.;

z = 0;

sum_wtx_sq = 0;

do _n_=1 to &nobs.;

yhat = p_[0];

do k=1 to &numvars.;

yhat = sum(yhat, p_*x_[_n_, k]);

end;

proby = 1/(1 + exp(-yhat));

if 0.00001 <= proby >= 0.99999 then do;

proby = round(proby, 1);

weight = 0.00001;

end;

else weight = proby*(1 - proby);

z = sum(z, (x_[_n_,j]*(y_[_n_] - proby) + weight*p_*(x_[_n_,j])**2));

sum_wtx_sq = sum(sum_wtx_sq, weight*(x_[_n_,j])**2);

end; *endLoop: _n_= [nobs];

if j>0 then do;

if gamma<abs(z/&nobs.) then do;

if z>0 then p_ = (z/&nobs. - gamma);

else p_ = (z/&nobs. + gamma);

p_ = p_/(sum_wtx_sq/&nobs. + lambda - gamma);

end;

else p_=0;

end;

else p_ = z/sum_wtx_sq;

end; *endLoop: j= [numvars];

end; *endLoop: i= [numiter];

parm_unstnd_[0] = p_[0];

do _n_=1 to &numvars.;

parm_unstnd_[0] = parm_unstnd_[0] - ((p_[_n_]*mean_[_n_])/std_[_n_]);

parm_unstnd_[_n_] = p_[_n_]/std_[_n_];

end;

output;

put "Final parm_unstnd_

=" parm_unstnd_

;

end; *endLoop: lambda= [lambda_list];

end; *endLoop: alpha= [alpha_list];

keep alpha lambda parm_unstnd_:;

stop;

run;

Final parm_unstnd_

=-6.968638061 0 0.0910300569 0.0147581702 0 0 -0.013146089 0 0.8256638121 0 0

NOTE: There were 442 observations read from the data set WORK.DIABETES_STND_ARRAY2.

NOTE: The data set WORK.COORD_DESCENT has 1 observations and 13 variables.

NOTE: DATA statement used (Total process time):

real time 0.06 seconds

cpu time 0.05 seconds

R:

library("glmnet")

set.seed(20)

foo <- read.table("http://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt", header=TRUE, sep="\t", strip.white=TRUE)

x <- as.matrix(foo[1:10])

y <- ifelse(foo$Y > 140, 1, 0)

system.time(bar <- glmnet(x, y, family = "binomial", alpha = 0.5) )

user system elapsed

0.03 0.00 0.03

print(coef(bar, s=0.1))

11 x 1 sparse Matrix of class "dgCMatrix"

1

(Intercept) -6.97872002

AGE .

SEX .

BMI 0.09119253

BP 0.01478218

S1 .

S2 .

S3 -0.01316737

S4 .

S5 0.82665323

S6 .

View solution in original post

ballardw · Posted 04-23-2015 11:29 AM

Just seeing a two dimensional array makes me ask if you have access to SAS/IML. It is designed for such things in general.

RobF · Posted 04-23-2015 11:36 AM

No - my company only has SAS EG loaded on the server. I'm hoping a fast & efficient program can be wriiten in base SAS without bringing in other modules.

ballardw · Posted 04-23-2015 12:56 PM

I'm not quite sure but I think this line:

do i=1 to &numiter;

may be the issue. It appears that you are iterating the exact same code 100000 times in the second loop controlled by j.

Do you get different results for different values of &numiter with all else the same?

Reeza · Posted 04-23-2015 01:02 PM

Echoing @ballardw suggestion, usually there is some option to check if parameters have stabilized so you don't need to keep repeating. It usually ends up being a do while loop of some kind.

RobF · Posted 04-23-2015 01:17 PM

Right, I am converging to a good solution after about 1,000 iterations which takes only about 3 seconds. I chose 100,000 iterations since that's the baseline # of iterations in the R glmnet routine.

However, for much larger datasets which I foresee I'll be running in the future, the runtime could conceivably take hours. I don't know, maybe this is the best I can do in Base SAS.

I just saw this article by Rick Wicklin which may help out even if I'm not using proc iml:

http://blogs.sas.com/content/iml/2013/05/15/vectorize-computations.html

Kurt_Bremser · Posted 04-24-2015 04:29 AM

Be aware that the data step compiler will not be on par with something like gcc or a FORTRAN compiler when it comes to optimizing code. There's too much runtime checking built in for what is basically interpreted code, so my guess is that you will never reach the performance of R for such operations.

I'm very fond of SAS, but if ever one of my "customers" wants to try solving a problem with R, he/she will have my full support in getting the SAS data warehouse data into R.

"How often have I told you blokes to use the right tool?" - LtCmdr Montgomery Scott

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

RobF · Posted 04-24-2015 09:52 AM

Thanks Kurt - is there a way to curtail the runtime checking in SAS to increase performance?

Kurt_Bremser · Posted 04-25-2015 08:20 AM

No. The data step compiler does not have any optimization options, as data steps are usually heavily I/O bound, and SAS provides prefabricated and optimized procedures for most statistic tasks which tend to be CPU bound.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

FriedEgg · Posted 04-24-2015 09:54 AM

You are missing any stopping criteria for the coordinate decent algorithm to stop iterating when it has converged for a given lambda.

RobF · Posted 04-24-2015 10:20 AM

Hi FriedEgg -

No, with the fast approaching deadlines for the global forum I didn't have time to include stopping criteria, or include wrapper code for using 5 or 10-fold cross validation to find optimal values of the alpha or lambda penalty values.

Instead I concentrated on just getting the algorithm to converge to a solution (:-P) by comparing output with glmnet in R using the diabetes test dataset for logistic and Poisson regressions. The code matches glmnet output reasonably well for ~1,000 iterations (but not perfect). I'm hoping the code is useful as a niche market solution for SAS users without access to Revolution Analytics who want to run a lasso on a dataset too big for R to handle.

Robert

FriedEgg · Posted 04-24-2015 12:20 PM

Yes, but this is the main factor (I think) behind why your code is running considerably longer. Since R will most likely stop far before it reaches the maximum iterations.

I am not going to speak to the validity of the following as a stopping criteria, but checking the value of sum_wtx_sq against it's lag for no change is valid for at least this test.

I made a few minor edits to you code, below, but nothing that makes any real change to the algorithm (beyond adding the stopping criteria I mention above).

data coord_descent;

array x_[&nobs.,0:&numvars.] _temporary_;

array y_[&nobs.] _temporary_;

/* cast data set to arrays x[i,j], y*/

retain x0-x&numvars.;

_adr1=addrlong(x0);

do _n_=1 to &nobs.;

set work.diabetes_stnd_array2;

y_[_n_]=y_gt140;

call pokelong( peekclong(_adr1), addrlong(x_[_n_,0]), %eval(8*%eval(&numvars.+1)) );

end;

array p_[0:&numvars.];

_adr2=addrlong(p_[0]);

_ones=repeat(put(1, rb8.), &numvars.);

array parm_unstnd_[0:&numvars.];

_adr3=addrlong(parm_unstnd_[0]);

_zero=repeat(put(0, rb8.), &numvars.);

array mean_[0:&numvars.] _temporary_ (%meanlist);

array std_[0:&numvars.] _temporary_ (%stdlist);

do alpha=&alpha_list.;

do lambda=&lambda_list.;

gamma = alpha*lambda;

call pokelong( _ones, _adr2, %eval(8*%eval(&numvars.+1)) );

call pokelong( _zero, _adr3, %eval(8*%eval(&numvars.+1)) );

do i=1 to &numiter. until (lag(sum_wtx_sq) eq sum_wtx_sq);

do j=0 to &numvars.;

z = 0;

sum_wtx_sq = 0;

do _n_=1 to &nobs.;

yhat = p_[0];

do k=1 to &numvars.;

yhat = sum(yhat, p_*x_[_n_, k]);

end;

proby = 1/(1 + exp(-yhat));

if 0.00001 <= proby >= 0.99999 then do;

proby = round(proby, 1);

weight = 0.00001;

end;

else weight = proby*(1 - proby);

z = sum(z, (x_[_n_,j]*(y_[_n_] - proby) + weight*p_*(x_[_n_,j])**2));

sum_wtx_sq = sum(sum_wtx_sq, weight*(x_[_n_,j])**2);

end; *endLoop: _n_= [nobs];

if j>0 then do;

if gamma<abs(z/&nobs.) then do;

if z>0 then p_ = (z/&nobs. - gamma);

else p_ = (z/&nobs. + gamma);

p_ = p_/(sum_wtx_sq/&nobs. + lambda - gamma);

end;

else p_=0;

end;

else p_ = z/sum_wtx_sq;

end; *endLoop: j= [numvars];

end; *endLoop: i= [numiter];

parm_unstnd_[0] = p_[0];

do _n_=1 to &numvars.;

parm_unstnd_[0] = parm_unstnd_[0] - ((p_[_n_]*mean_[_n_])/std_[_n_]);

parm_unstnd_[_n_] = p_[_n_]/std_[_n_];

end;

output;

put "Final parm_unstnd_

=" parm_unstnd_

;

end; *endLoop: lambda= [lambda_list];

end; *endLoop: alpha= [alpha_list];

keep alpha lambda parm_unstnd_:;

stop;

run;

Final parm_unstnd_

=-6.968638061 0 0.0910300569 0.0147581702 0 0 -0.013146089 0 0.8256638121 0 0

NOTE: There were 442 observations read from the data set WORK.DIABETES_STND_ARRAY2.

NOTE: The data set WORK.COORD_DESCENT has 1 observations and 13 variables.

NOTE: DATA statement used (Total process time):

real time 0.06 seconds

cpu time 0.05 seconds

R:

library("glmnet")

set.seed(20)

foo <- read.table("http://www4.stat.ncsu.edu/~boos/var.select/diabetes.tab.txt", header=TRUE, sep="\t", strip.white=TRUE)

x <- as.matrix(foo[1:10])

y <- ifelse(foo$Y > 140, 1, 0)

system.time(bar <- glmnet(x, y, family = "binomial", alpha = 0.5) )

user system elapsed

0.03 0.00 0.03

print(coef(bar, s=0.1))

11 x 1 sparse Matrix of class "dgCMatrix"

1

(Intercept) -6.97872002

AGE .

SEX .

BMI 0.09119253

BP 0.01478218

S1 .

S2 .

S3 -0.01316737

S4 .

S5 0.82665323

S6 .

RobF · Posted 04-24-2015 01:22 PM

Cool, thanks FriedEgg for the code edits & everyone else for the helpful comments.

Ah, I missed the fact that glmnet is exiting the loop after the convergence criteria (1E-07) is reached, Ok it makes sense now why I'm seeing the runtime discrepancies.

There are still a few more improvements to be made to the code. I don't need to continue updating parameters once their estimate is zero.

Also the inner loop:

do k=1 to &numvars;

yhat = sum(yhat, p_*x_[_nn_,k]);

end;

is unnecessarily repeating the same product calculation for values that aren't changing during each iteration of the loop do j=0 to &numvars;

jakarman · Posted 04-24-2015 11:45 AM

although you do not have IML you cuould have used SAS(R) 9.4 DS2 Language Reference, Fourth Edition with the matrxc package it is part of base.
No idea it would dot the job. The difference is it should be better dot he job instead of the datamanipulation goal (base SAS). Your are too late for that anyway.

---->-- ja karman --<-----

jakarman · Posted 04-25-2015 08:49 AM

It helps as shown to eliminate unnecessary loops. Wel done "friedegg"

Still the datastep and ds2 are build for data-manipulation. When you can use the "SAS procs" they are build for analyzing/reporting, you are better with those. A better name for "SAS procs"are packages as they are pre_build solutions. However as packages are also used to indicate other programming styles that can be confusing. The matrix package and SQL package within DS2 are examples of those.
Optimizing code is optimizing resource usage (minimizing) and using the best code-parts for those.

---->-- ja karman --<-----

Registration is open

SAS Training: Just a Click Away