Solved: Re: Why proc gradboost autotune gives different best parameters every ...

zdc · Posted 09-13-2021 08:07 AM

Hi,

I'm using gradient boosting as algorithm for my model in SAS Code. I'm also trying to find the best parameters using autotune. But every time I run the program it returns different best parameters. I couldn't understand the working logic. Isn't there only one best parameter? Why does it return different best parameters each time -unlike python- ? I would be very happy if anyone has come across and knows that problem.

Thank you.

sbxkoenk · Posted 09-14-2021 06:43 AM

Hello @zdc ,

Gradient boosting is based on trees and trees have a very angular response surface (not smooth!).
The difference between turning 30 yesterday and turning 30 today can be big if there is a split at 30.
Training the model (the learning process) can also be very sensitive to small subtleties in the data.

What happens if you augment the number of trees to grow the gradient boosting model?

This is the NTREES=number option on the PROC GRADBOOST statement.

And what happens if you put the NTHREADS option to 1?

NTHREADS=number-of-threads : specifies the number of threads to use in the computation. The default value is the number of CPUs available on the machine.
To know about the n° of CPUs you have available 'now', submit this:

%LET cpucount_now=%sysfunc(getoption(cpucount));
%PUT &=cpucount_now;

Koen

View solution in original post

sbxkoenk · Posted 09-13-2021 12:02 PM

Hello,

This behavior could have many reasons.

If nobody has answered by tomorrow I will elaborate.

But already this note: Some SAS Visual Data Mining and Machine Learning models are created with a nondeterministic process.
I also think 'autotune' is done with local search algorithms (something like Genetic Algorithms / GAs). I think sensitivity to local optima is higher with GAs than with calculus-based techniques, but I'm unsure about this.

Can somebody with super-powers like @BeverlyBrown move this topic to
Analytics > SAS Data Mining and Machine Learning ?

Kind regards,
Koen

zdc · Posted 09-14-2021 02:41 AM

Hi,

Thank you for your reply. Since I use the same data, even if they change, I expect the parameters and the resulting mape value to not change much. But I'm getting quite different results. I can't understand why.

sbxkoenk · Posted 09-14-2021 06:43 AM

Hello @zdc ,

Gradient boosting is based on trees and trees have a very angular response surface (not smooth!).
The difference between turning 30 yesterday and turning 30 today can be big if there is a split at 30.
Training the model (the learning process) can also be very sensitive to small subtleties in the data.

What happens if you augment the number of trees to grow the gradient boosting model?

This is the NTREES=number option on the PROC GRADBOOST statement.

And what happens if you put the NTHREADS option to 1?

NTHREADS=number-of-threads : specifies the number of threads to use in the computation. The default value is the number of CPUs available on the machine.
To know about the n° of CPUs you have available 'now', submit this:

%LET cpucount_now=%sysfunc(getoption(cpucount));
%PUT &=cpucount_now;

Koen

zdc · Posted 09-16-2021 03:51 AM

Hi @sbxkoenk,

Thank you for your reply. I was also expecting differences but not that much. When I augmented the number of trees and put the nthreads option to 1, the differences have been more acceptable. This provided some insight, thank you!

ballardw · Posted 09-13-2021 12:03 PM

Why are you rerunning the code?

Are you running the EXACT same code every time or changing options?

Changed data? I would think that pretty obvious that different data can create different output.

Different sort order? Some algorithms will be affected by data order (don't know if that is the case here but may).

I ran into a different software package where some of the models would return different results based on the order of the variables in the model statement of the code. Our process with that procedure was to always rerun results moving the most significant (or two) variables to the end of the variable list to see if they stayed significant.

You might also consider posting the code you are using.

BrettWujek · Posted 09-13-2021 01:26 PM

This has nothing to do with the algorithm or the autotune methodology exactly. Both of these (the modeling algorithm and the autotune strategy) support taking in a seed to use to initialize random number generation. If the data has not changed (including order) then you will get the same results.

As pointed out in a previous reply, the real issue here is the data ordering. In a distributed computing environment where data is distributed among various worker nodes, that distribution is typically not the same, as data is moved asynchronously to these nodes. Autotuning takes advantage of the compute resources to train multiple models in parallel by creating multiple sessions for candidate models to train on different nodes - which causes this redistribution of data. If you are running in an SMP environment (ie, no worker nodes...everything runs on one machine) OR if you are running in an MPP environment (a controller node with worker nodes) but with ONLY one worker, everything should still be repeatable as long as you set the seed in the autotune statement options.

Python does not magically solve this...if you are running a distributed environment like Spark and the tuning process dynamically takes advantage of training models in parallel, you will have the same problem.

Hope this provides some insight.

Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

zdc · Posted 09-14-2021 02:33 AM

Hi,

Thank you for your reply. I pay special attention to the fact that the data does not change and I use exactly the same data. I run the model again with the best parameters and calculate the mape value. Because of different best parameters every time, it returns different mape values (quite different). Even when I run it over and over with the same best parameters, it returns different results. When I run it in python with the same data, the best parameters values do not change. As a result, the mape value does not change either.
I also expect to get the same results every time, but I don't understand why it that much changed.

zdc · Posted 09-14-2021 02:22 AM

Hi,

Thank you for your reply. I had to rerun the code and I run the exact same code each time with exactly the same data (including sorting).

I used default autotune values.

ods noproctitle;

proc gradboost data=CASUSER.TRAIN_DATA;
	target Number / level=interval;
	input A B C D E F G H J K/ level=interval;
	autotune tuningparameters=(ntrees samplingrate vars_to_try(init=100) learningrate lasso ridge) objective=mae kfold=5;
	ods output FitStatistics=Work._Gradboost_FitStats_ 
		VariableImportance=Work._Gradboost_VarImp_;
	score out=casuser.scored_train copyvars=(_all_);
	savestate rstore=casuser.model_train;
	id _all_;
run;

Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Re: Why proc gradboost autotune gives different best parameters every time?

Ready to join fellow brilliant minds for the SAS Hackathon?