2021 Trend Almanac: Technologies and trends that will dominate the business and consumer landscape. Get instant access
Clustering analysis is the process of creating clusters by grouping together data that has more similarities with those in its cluster than those in another cluster, and then analyzing the clusters to have a better understanding of the data set as a whole. This type of data analysis is most often performed by machine learning (ML), and there are many different clustering algorithms available to decide how to create the clusters. Different methods of clustering analysis can be useful when searching for different results, as the choice of clustering method can affect the analysis. For this reason, it is not uncommon to conduct multiple different clustering analyses on the same data set. Three of the most common methods are briefly described below.
Centroid-based clustering creates clusters around a single point (that is not necessarily a part of the data set) in order to best create equal sized clusters with an optimal number of clusters. Centroid-based clustering analysis generally measures Euclidean distances to determine the number of groups and to finalize the cluster centroids. This type of analysis also often utilizes k-means clustering and is often used for game analytics.
Hierarchical clustering, also called connectivity-based clustering, is a model based on the basic idea that data is more related to the data that is closer to it than the data further away from it. This clustering method depends on distance measured between each data point, which often relies on the use of choice of distance functions. Hierarchical clustering utilizes unique partitioning methods but will still rely on the user to choose appropriate clusters to form a hierarchy. These types of cluster results are often used for phylogenetic trees.
Density-based clustering defines clusters based on their density in comparison to the density of the rest of the data, usually found within a specified distance matrix. This cluster’s method can help track the spread of disease by looking at origination points, or even to track trends around successful or unsuccessful shots while playing basketball.
Clustering analysis can be useful in a variety of ways across organizations, including: