首页 > 其他 > 详细

【DM论文阅读杂记】LightGBM

时间:2020-09-18 15:31:33      阅读:50      评论:0      收藏:0      [点我收藏+]

Paper Title

LightGBM: A Highly Efficient Gradient Boosting Decision Tree

Basic algorithm and main steps

Basic ideas

It introduces a novel GBDT algorithm to reduct features and data, including :

  • Gradient-based One-Side Sampling (GOSS) : Retainde some small gradient samples randomly while preserving large gradient samples.

  • Exclusive Feature Bundling (EFB) : merge sparse features with little conflict to reduct feature deminision.

Main Agorithm

  1. GOSS

    • Sort the traning data descendly based on the gradient of the data, and retain the top A th set.
    • Ramdomly sample in the U\A set.
  2. EFB

    • Construct a weighted undirected graph with vertices as features and edges as weights.Sort the nodes by the degree. The greater the degree, the greater the conflict with other features.

    • Improve the first step: sort the features by the number of non-zero values.

    • Calculate the conflict ratio between different features

    • Iterate through each feature and merge the features to minimize the conflict ratio. (need to add constant bia)

?

Motivation

  • Conventional GBDT models need to scan all the data instances for every feature to estimate the information gain of all the possible split points, which is in high complexity.
  • Pre-sort, one of the method to find the best split points, is inefficient in both training speed and memory consumption.
  • Histogram-based algorithm, another main method, does not have efficient sparse optimization solutions

Contribution

  • Proposed a novel GBDT algorithm called LightGBM, containing Gradient-based One-Side Sampling and Exclusive Feature Bundling, which bring a significant speed-up for large GBDT training process.
  • LightGBM can significantly outperform XGBoost and SGB in terms of computational speed and memory consumption

My own idea

  • Relations to what I had read.
  • Shortcomings I assume: the LightGBM may be sensitive to noisy points since the EFB is based on bias.
  • Potential change I assume: explore the process of find the optimal selection of a and b in Gradient-based One-Side Sampling; generate the model to non-sparse data; paralle the LightGBM.

【DM论文阅读杂记】LightGBM

原文:https://www.cnblogs.com/chensy86/p/13623704.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!