Difficulty: Medium
Correct Answer: Agglomerative clustering starts with each data point as its own cluster and repeatedly merges clusters, while divisive clustering starts with all data points in one cluster and repeatedly splits clusters
Explanation:
Introduction / Context:
Hierarchical clustering is a popular unsupervised learning technique in data mining and machine learning. It builds a hierarchy of clusters that can be visualised as a dendrogram. There are two main strategies to build this hierarchy, known as agglomerative and divisive clustering. Interviewers often ask about the difference between these two approaches, because understanding it helps clarify how hierarchical methods are constructed.
Given Data / Assumptions:
Concept / Approach:
Agglomerative hierarchical clustering is a bottom up approach. It starts with every data point as its own cluster and then repeatedly merges the two closest clusters according to a distance or similarity measure, until a single cluster remains or a stopping criterion is reached. Divisive hierarchical clustering is a top down approach. It starts with all data points in one cluster and repeatedly splits clusters into smaller clusters, again using some criterion, until each cluster contains just a few points or a chosen threshold is satisfied.
Step-by-Step Solution:
Step 1: Recall that agglomerative means building up, which hints at starting with small units and merging them.
Step 2: Confirm that in agglomerative clustering, each data point begins as a separate cluster.
Step 3: Remember that the algorithm then iteratively merges the two closest clusters until stopping criteria are met.
Step 4: Recall that divisive means dividing, which suggests starting from a large unit and splitting it.
Step 5: Confirm that in divisive clustering, all data points start in one cluster and the method repeatedly splits clusters into smaller ones.
Step 6: Choose the option that correctly contrasts bottom up merging with top down splitting.
Verification / Alternative check:
A quick way to verify is to think about how the dendrogram is built. In agglomerative clustering, the dendrogram is formed by joining leaves at the bottom, whereas in divisive clustering you conceptually start from the root and cut downwards. This visual perspective aligns with the described difference and supports the chosen answer.
Why Other Options Are Wrong:
Option B: Incorrectly links the methods to specific data types such as numeric or text, which is not a defining property.
Option C: States that one is supervised and the other is unsupervised, but both agglomerative and divisive hierarchical clustering are unsupervised methods.
Option D: Suggests that divisive clustering does not use distance or similarity measures, which is not generally true because splitting strategies also rely on such measures.
Common Pitfalls:
Learners sometimes confuse hierarchical clustering with k means and think that a fixed number of clusters must be specified in advance. Another pitfall is to mix up which approach is bottom up and which is top down. Remembering that agglomerative builds up by merging and divisive splits down from a single cluster helps prevent this confusion.
Final Answer:
The key difference is that Agglomerative clustering starts with each data point as its own cluster and repeatedly merges clusters, while divisive clustering starts with all data points in one cluster and repeatedly splits clusters.
Discussion & Comments