【DM论文阅读杂记】LightGBM

时间：2020-09-18 15:31:33 阅读：52 评论：0 收藏：0 [点我收藏+]

Paper Title

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

It introduces a novel GBDT algorithm to reduct features and data, including :

Gradient-based One-Side Sampling (GOSS) : Retainde some small gradient samples randomly while preserving large gradient samples.
Exclusive Feature Bundling (EFB) : merge sparse features with little conflict to reduct feature deminision.

GOSS
- Sort the traning data descendly based on the gradient of the data, and retain the top A th set.
- Ramdomly sample in the U\A set.
EFB
- Construct a weighted undirected graph with vertices as features and edges as weights.Sort the nodes by the degree. The greater the degree, the greater the conflict with other features.
- Improve the first step: sort the features by the number of non-zero values.
- Calculate the conflict ratio between different features
- Iterate through each feature and merge the features to minimize the conflict ratio. (need to add constant bia)

Conventional GBDT models need to scan all the data instances for every feature to estimate the information gain of all the possible split points, which is in high complexity.
Pre-sort, one of the method to find the best split points, is inefficient in both training speed and memory consumption.
Histogram-based algorithm, another main method, does not have efficient sparse optimization solutions

Proposed a novel GBDT algorithm called LightGBM, containing Gradient-based One-Side Sampling and Exclusive Feature Bundling, which bring a significant speed-up for large GBDT training process.
LightGBM can significantly outperform XGBoost and SGB in terms of computational speed and memory consumption

Relations to what I had read.
Shortcomings I assume: the LightGBM may be sensitive to noisy points since the EFB is based on bias.
Potential change I assume: explore the process of find the optimal selection of a and b in Gradient-based One-Side Sampling; generate the model to non-sparse data; paralle the LightGBM.

原文：https://www.cnblogs.com/chensy86/p/13623704.html

踩

(0)

评论一句话评论（0）

分享档案

更多>