What is gradient boosting trees?

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. There differences between gradient boosting and random forest are:

  1. Random Forests builds each tree independently while Gradient Boosting builds one tree at a time.
  2. Random Forests combine results at the end of the process (by averaging or “majority rules”) while Gradient Boosting combines results along the way.

What are the main parameters in the gradient boosting model?

There are many parameters, but below are a few key defaults.

  • learning_rate=0.1 (shrinkage).
  • n_estimators=100 (number of trees).
  • max_depth=3.
  • min_samples_split=2.
  • min_samples_leaf=1.
  • subsample=1.0.

Most implementations of gradient boosting are configured by default with a relatively small number of trees, such as hundreds or thousands. Using scikit-learn we can perform a grid search of the n_estimators model parameter

Speak Your Mind