How does K-means work?

K-means works by:

Partition points into k subsets.
Compute the seed points as the new centroids of the clusters of the current partitioning.
Assign each point to the cluster with the nearest seed point.
Go back to step 2 or stop when the assignment does not change.

Selecting K for K-means

Domain knowledge, i.e. an expert knows the value of k
Elbow method: compute the clusters for different values of k, for each k, calculate the total within-cluster sum of square, plot the sum according to the number of clusters and use the band as the number of clusters.
Average silhouette method: compute the clusters for different values of k, for each k, calculate the average silhouette of observations, plot the silhouette according to the number of clusters and select the maximum as the number of clusters.

k-medoids: Takes the most central point instead of the mean value as the center of the cluster. This makes it more robust to noise.
Agglomerative Hierarchical Clustering (AHC): hierarchical clusters combining the nearest clusters starting with each point as its own cluster.
DIvisive ANAlysis Clustering (DIANA): hierarchical clustering starting with one cluster containing all points and splitting the clusters until each point describes its own cluster.
Density-Based Spatial Clustering of Applications with Noise (DBSCAN): Cluster defined as maximum set of density-connected points.