What are the main parameters of the random forest model?

max_depth: Longest Path between root node and the leaf
min_sample_split: The minimum number of observations needed to split a given node
max_leaf_nodes: Conditions the splitting of the tree and hence, limits the growth of the trees
min_samples_leaf: minimum number of samples in the leaf node
n_estimators: Number of trees
max_sample: Fraction of original dataset given to any individual tree in the given model
max_features: Limits the maximum number of features provided to trees in random forest model

Selecting the depth of the trees in random forest

The greater the depth, the greater amount of information is extracted from the tree, however, there is a limit to this, and the algorithm even if defensive against overfitting may learn complex features of noise present in data and as a result, may overfit on noise. Hence, there is no hard thumb rule in deciding the depth, but literature suggests a few tips on tuning the depth of the tree to prevent overfitting:

limit the maximum depth of a tree
limit the number of test nodes
limit the minimum number of objects at a node required to split
do not split a node when, at least, one of the resulting subsample sizes is below a given threshold
stop developing a node if it does not sufficiently improve the fit.

How many trees we need in random forest?

The number of trees in random forest is worked by n_estimators, and a random forest reduces overfitting by increasing the number of trees. There is no fixed thumb rule to decide the number of trees in a random forest, it is rather fine tuned with the data, typically starting off by taking the square of the number of features (n) present in the data followed by tuning until we get the optimal results.

What are the main parameters of the random forest model?

Selecting the depth of the trees in random forest

How many trees we need in random forest?

Related Questions

Attribution

Speak Your Mind Cancel reply