Disjoint Set Data Structure
Also called a Union Find Data Structure.
Keeps track of the partitioning of elements across several non-overlapping sets. Two operations are especially significant in regards to this data structure:
- Determining the set that an element belongs to.
- Merging two sets into a single set.
Each disjoint set data structures keeps track of a group of dynamic disjoint sets. Every set has a representative. The representative remains constant throughout the lifetime of its dynamic set.
The operations that a disjoint set data structure has available are:
- makeSet(x) makes a new set whose only member/representative is x.
- x may not belong to any other set. This would break the disjoint constraint.
- getSet(x) returns a the representative of the set containing x.
- union/merge(x, y) merges the sets that hold x and y, into a new set
- Our representative can be any member from the combined set.
Disjoint Set Forests
Each set is its own tree with the root node being the representative. Every node in the tree points to its parent and the root node points to itself.
By implementing two heuristics, union by rank and path compression, as of 2009, this is the fastest known disjoint set data structure. In this structure:
- makeSet(x) creates a single node tree.
- getSet(x) follows pointers to parents until it reaches the representative.
- The path followed from x to the representative is known as the find path.
- Additionally, it is at this stage that the path compression heuristic is implemented.
- Path compression simply makes each node in the find path point to the representative.
- union(x,y) uses the union by rank concept to have the tree with lower rank point to the representative of the tree with a larger rank.
- Rank is often assumed to be the height of the tree, although more accurately can be the number of elements.
Applications
This data structure has applications in various areas such as graph theory, e.g. determining connected components, Kruskal's Minimum Spanning Tree Algorithm.