What is random forest? Why do we need randomization in random forest?

Random Forest is a machine learning method for regression and classification which is composed of many decision trees. Random Forest belongs to a larger class of ML algorithms called ensemble methods (in other words, it involves the combination of several models to solve a single prediction problem).

Random forest in an extention of the bagging algorithm which takes random data samples from the training dataset (with replacement), trains several models and averages predictions. In addition to that, each time a split in a tree is considered, random forest takes a random sample of m features from full set of n features (with replacement) and uses this subset of features as candidates for the split (for example, m = sqrt(n)).

Training decision trees on random data samples from the training dataset reduces variance. Sampling features for each split in a decision tree decorrelates trees.

Speak Your Mind