How does DBScan work?
DBSCAN works by:
- Two input parameters epsilon (neighborhood radius) and minPts (minimum number of points in an epsilon-neighborhood)
- Cluster defined as maximum set of density-connected points.
- Points p_j and p_i are density-connected w.r.t. epsilon and minPts if there is a point o such that both, i and j are density-reachable from o w.r.t. epsilon and minPts.
- p_j is density-reachable from p_i w.r.t. epsilon, minPts if there is a chain of points p_i -> p_i+1 -> p_i+x = p_j such that p_i+x is directly density-reachable from p_i+x-1.
- p_j is a directly density-reachable point of the neighborhood of p_i if dist(p_i,p_j) <= epsilon.
When would you choose K-means and when DBScan?
- DBScan is more robust to noise.
- DBScan is better when the amount of clusters is difficult to guess.
- K-means has a lower complexity, i.e. it will be much faster, especially with a larger amount of points.