What is regression and linear regression?

What is regression? Which models can you use to solve a regression problem?

Regression is a part of supervised ML. Regression models investigate the relationship between a dependent (target) and independent variable (s) (predictor). Here are some common regression models

  • Linear Regression establishes a linear relationship between target and predictor (s). It predicts a numeric value and has a shape of a straight line.
  • Polynomial Regression has a regression equation with the power of independent variable more than 1. It is a curve that fits into the data points.
  • Ridge Regression helps when predictors are highly correlated (multicollinearity problem). It penalizes the squares of regression coefficients but doesn’t allow the coefficients to reach zeros (uses L2 regularization).
  • Lasso Regression penalizes the absolute values of regression coefficients and allows some of the coefficients to reach absolute zero (thereby allowing feature selection).

What is linear regression? When do we use it?

Linear regression is a model that assumes a linear relationship between the input variables (X) and the single output variable (y).

With a simple equation:

y = B0 + B1*x1 + ... + Bn * xN

B is regression coefficients, x values are the independent (explanatory) variables and y is dependent variable.

The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression.

Simple linear regression:

y = B0 + B1*x1

Multiple linear regression:

y = B0 + B1*x1 + ... + Bn * xN

Methods for solving linear regression do you know?

To solve linear regression, you need to find the coefficients which minimize the sum of squared errors.

Matrix Algebra method: Let’s say you have X, a matrix of features, and y, a vector with the values you want to predict. After going through the matrix algebra and minimization problem, you get this solution: .

But solving this requires you to find an inverse, which can be time-consuming, if not impossible. Luckily, there are methods like Singular Value Decomposition (SVD) or QR Decomposition that can reliably calculate this part (called the pseudo-inverse) without actually needing to find an inverse. The popular python ML library sklearn uses SVD to solve least squares.

Alternative method: Gradient Descent.

Speak Your Mind