What’s the difference between L2 and L1 regularization?

Background

There are mainly two types of regularization,

L1 Regularization (Lasso regularization) - Adds the sum of absolute values of the coefficients to the cost function. $\lambda\sum_{i=1}^{n} \left | w_i \right |$
L2 Regularization (Ridge regularization) - Adds the sum of squares of coefficients to the cost function. $\lambda\sum_{i=1}^{n} {w_{i}}^{2}$

L1 regularization adds a penalty term to our cost function which is equal to the sum of modules of models coefficients multiplied by a lambda hyperparameter. For example, cost function with L1 regularization will look like: $\sum_{i=0}^{N} (y_i - \sum_{j=0}^{M} x_{ij} * w_j)+\lambda\sum_{j=0}^{M} \left | w_j \right |$

L2 regularization adds a penalty term to our cost function which is equal to the sum of squares of models coefficients multiplied by a lambda hyperparameter. This technique makes sure that the coefficients are close to zero and is widely used in cases when we have a lot of features that might correlate with each other.

Difference between L2 and L1 regularization

Penalty terms: L1 regularization uses the sum of the absolute values of the weights, while L2 regularization uses the sum of the weights squared.
Feature selection: L1 performs feature selection by reducing the coefficients of some predictors to 0, while L2 does not.
Computational efficiency: L2 has an analytical solution, while L1 does not.
Multicollinearity: L2 addresses multicollinearity by constraining the coefficient norm.

What’s the difference between L2 and L1 regularization?

Background

Difference between L2 and L1 regularization

Related Questions

Attribution

Speak Your Mind Cancel reply