Googlier.com News Article Search
Googlier.com Precious Metals Exchange Smart Contracts
Googlier.com Document Services





          Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training      

In this paper, we present a novel parallel implementation for training Gradient Boosting Decision Trees (GBDTs) on Graphics Processing Units (GPUs). Thanks to the excellent results on classification/regression and the open sourced libraries such as XGBoost, GBDTs have become very popular in recent years and won many awards in machine learning and data mining competitions. Although GPUs have demonstrated their success in accelerating many machine learning applications, it is challenging to develop an efficient GPU-based GBDT algorithm. The key challenges include irregular memory accesses, many sorting operations with small inputs and varying data parallel granularities in tree construction. To tackle these challenges on GPUs, we propose various novel techniques including (i) Run-length Encoding compression and thread/block workload dynamic allocation, (ii) data partitioning based on stable sort, and fast and memory efficient attribute ID lookup in node splitting, (iii) finding approximate split points using two-stage histogram building, (iv) building histograms with the aware of sparsity and exploiting histogram subtraction to reduce histogram building workload, (v) reusing intermediate training results for efficient gradient computation, and (vi) exploiting multiple GPUs to handle larger data sets efficiently. Our experimental results show that our algorithm named ThunderGBM can be 10x times faster than the state-of-the-art libraries (i.e., XGBoost, LightGBM and CatBoost) running on a relatively high-end workstation of 20 CPU cores. In comparison with the libraries on GPUs, ThunderGBM can handle higher dimensional problems which the libraries become extremely slow or simply fail. For the data sets the existing libraries on GPUs can handle, ThunderGBM achieves up to 10 times speedup on the same hardware, which demonstrates the significance of our GPU optimizations. Moreover, the models trained by ThunderGBM are identical to those trained by XGBoost,- and have similar quality as those trained by LightGBM and CatBoost.


Array ( [and] => 12 [the] => 11 [on] => 8 [GPUs] => 7 [to] => 6 [in] => 5 [libraries] => 5 [data] => 5 [ThunderGBM] => 4 [many] => 3 [handle] => 3 [XGBoost] => 3 [trained] => 3 [results] => 3 [have] => 3 [efficient] => 3 [by] => 3 [building] => 3 [histogram] => 3 [with] => 3 [can] => 3 [for] => 3 [of] => 3 [i] => 2 [become] => 2 [memory] => 2 [machine] => 2 [learning] => 2 [which] => 2 [our] => 2 [CatBoost] => 2 [LightGBM] => 2 [italic] => 2 [algorithm] => 2 [challenges] => 2 [times] => 2 [sets] => 2 [as] => 2 [training] => 2 [In] => 2 [GBDTs] => 2 [parallel] => 2 [novel] => 2 [a] => 2 [exploiting] => 2 [we] => 2 [those] => 2 [Decision] => 2 [Boosting] => 2 [Gradient] => 2 [workload] => 2 [experimental] => 1 [two-stage] => 1 [named] => 1 [using] => 1 [that] => 1 [show] => 1 [be] => 1 [Our] => 1 [vi] => 1 [reusing] => 1 [v] => 1 [intermediate] => 1 [reduce] => 1 [gradient] => 1 [computation] => 1 [subtraction] => 1 [efficiently] => 1 [sparsity] => 1 [aware] => 1 [histograms] => 1 [multiple] => 1 [iv] => 1 [larger] => 1 [Exploiting] => 1 [comparison] => 1 [x] => 1 [significance] => 1 [achieves] => 1 [up] => 1 [speedup] => 1 [same] => 1 [hardware] => 1 [demonstrates] => 1 [GPU] => 1 [For] => 1 [optimizations] => 1 [Moreover] => 1 [models] => 1 [are] => 1 [identical] => 1 [-] => 1 [similar] => 1 [existing] => 1 [fail] => 1 [faster] => 1 [CPU] => 1 [than] => 1 [state-of-the-art] => 1 [e] => 1 [running] => 1 [relatively] => 1 [high-end] => 1 [workstation] => 1 [cores] => 1 [simply] => 1 [split] => 1 [higher] => 1 [dimensional] => 1 [problems] => 1 [extremely] => 1 [slow] => 1 [or] => 1 [points] => 1 [block] => 1 [approximate] => 1 [demonstrated] => 1 [recent] => 1 [years] => 1 [won] => 1 [awards] => 1 [mining] => 1 [competitions] => 1 [Although] => 1 [their] => 1 [very] => 1 [success] => 1 [accelerating] => 1 [applications] => 1 [it] => 1 [is] => 1 [challenging] => 1 [develop] => 1 [popular] => 1 [such] => 1 [GPU-based] => 1 [Trees] => 1 [Efficient] => 1 [Tree] => 1 [Training] => 1 [this] => 1 [paper] => 1 [present] => 1 [implementation] => 1 [Graphics] => 1 [sourced] => 1 [Processing] => 1 [Units] => 1 [Thanks] => 1 [excellent] => 1 [classification] => 1 [regression] => 1 [open] => 1 [an] => 1 [GBDT] => 1 [finding] => 1 [stable] => 1 [compression] => 1 [thread] => 1 [dynamic] => 1 [allocation] => 1 [ii] => 1 [partitioning] => 1 [based] => 1 [sort] => 1 [Run-length] => 1 [fast] => 1 [attribute] => 1 [ID] => 1 [lookup] => 1 [node] => 1 [splitting] => 1 [iii] => 1 [Encoding] => 1 [including] => 1 [The] => 1 [inputs] => 1 [key] => 1 [include] => 1 [irregular] => 1 [accesses] => 1 [sorting] => 1 [operations] => 1 [small] => 1 [varying] => 1 [techniques] => 1 [granularities] => 1 [tree] => 1 [construction] => 1 [To] => 1 [tackle] => 1 [these] => 1 [propose] => 1 [various] => 1 [quality] => 1 )

© Googlier LLC, 2019