How does DBScan work?

DBSCAN works by:

  • Two input parameters epsilon (neighborhood radius) and minPts (minimum number of points in an epsilon-neighborhood)
  • Cluster defined as maximum set of density-connected points.
  • Points p_j and p_i are density-connected w.r.t. epsilon and minPts if there is a point o such that both, i and j are density-reachable from o w.r.t. epsilon and minPts.
  • p_j is density-reachable from p_i w.r.t. epsilon, minPts if there is a chain of points p_i -> p_i+1 -> p_i+x = p_j such that p_i+x is directly density-reachable from p_i+x-1.
  • p_j is a directly density-reachable point of the neighborhood of p_i if dist(p_i,p_j) <= epsilon.

When would you choose K-means and when DBScan?

  • DBScan is more robust to noise.
  • DBScan is better when the amount of clusters is difficult to guess.
  • K-means has a lower complexity, i.e. it will be much faster, especially with a larger amount of points.

Speak Your Mind