BookmarkSubscribeRSS Feed
WouterG
Fluorite | Level 6

Is there a way to reduce the run time of the SAS gbm node in EM ? We are running HP forests and R inside the Open Source Integration node with comparable run times on the same data.

 

As soon as we run the SAS GBM with the same parameter sets as the R based GBM in the Open Source integration node the run time of the EM GBM nodes is 5-10 times slower.

 

Are there any system / node parameters that we can look at to speed up the processing.

3 REPLIES 3
JasonXin
SAS Employee
Hi, How many variables /observations are you trying with GBM node? Possible to run variable selection before GBM node? EM's HPFOREST node runs at least on multi-threading ability (SMP) of the CPUs; if you are configured to run on MPP (massively parallel processing) engaging, say, 32 or 48 computers, the speed and other performance are expected to be better than SMP. The GBM node is not supported to run on SMP or MPP. It runs on traditional single thread node; the node does not have a HP prefix in front of it (this is how you tell). The speed expectation, therefore, is not supposed to be in line with HPFOREST, besides inner algorithm difference. This is, on the other hand, in no way suggesting random forest runs faster than GBM, or vice verse. The latest SAS Viya sports a geninue in-memory GBM procedure and actions that scale on wide /table tables, a real big data. SAS is not expected to upgrade the existing EM GBM node to something like "HPGBM". Hope this helps? Thank you for using SAS. Best Regards Jason Xin
WouterG
Fluorite | Level 6

Hi Jason,

 

Thanks for that. I am aware that GBM is still a serial procedure. I am also aware that HPFOREST runs in parallel even in SMP mode.

Is there a way to speed up the existing GBM node ? We are using GBM in R as our 'go-to' algorithm for most of our predictive models and we want to replicate that in SAS.

 

HOwever the SAS GBM is orders of magnitude slower than the R equivalent on exactly the same data. 

 

SAS Viya is not on our horizon soon so we'll have to make do with SAS or R.

JasonXin
SAS Employee
Hi, I have seen cases in the past where EM GBM performs in comparable speed with R integrated into the same flow, everything else roughly held equal. Yes, I have seen cases where GBM is slower than R. And vice Verse. So there is little general to infer or conclude. If I am to be very useful to you, in eventuality, I will have to see down in front of your data set and operations to help speed up, as I did several times in the past. Generally speaking, EM spends a lot of resource running the GUI operations, writing and rewriting code in the background, something that running R through the integration node does not entail. Often when one EM node runs this slow, it indicates the work space for the flow likely is running out of space. It is simply writing as it is swapping... This eventually is a SAS Management Console subject where one can try to relocate and optimize space management. If GUI operation does not appeal to you that much, you can try the underlying procedure TreeBOOST. If you go to Google.com, search for "Jason Xin, treeboost", you should quickly get to the full-fledged sample code I published years ago. Once you finish modeling using the procedure code, you can re-introduce the predicted value back to EM by using Model Import Node to align model comparison with other models you are building with EM GUI. Hope this helps. Thanks. Jason Xin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1387 views
  • 0 likes
  • 2 in conversation