What’s the difference between L2 and L1 regularization?
Background
There are mainly two types of regularization,
- L1 Regularization (Lasso regularization) - Adds the sum of absolute values of the coefficients to the cost function.
- L2 Regularization (Ridge regularization) - Adds the sum of squares of coefficients to the cost function.
L1 regularization adds a penalty term to our cost function which is equal to the sum of modules of models coefficients multiplied by a lambda hyperparameter. For example, cost function with L1 regularization will look like:
L2 regularization adds a penalty term to our cost function which is equal to the sum of squares of models coefficients multiplied by a lambda hyperparameter. This technique makes sure that the coefficients are close to zero and is widely used in cases when we have a lot of features that might correlate with each other.
Difference between L2 and L1 regularization
- Penalty terms: L1 regularization uses the sum of the absolute values of the weights, while L2 regularization uses the sum of the weights squared.
- Feature selection: L1 performs feature selection by reducing the coefficients of some predictors to 0, while L2 does not.
- Computational efficiency: L2 has an analytical solution, while L1 does not.
- Multicollinearity: L2 addresses multicollinearity by constraining the coefficient norm.