
Introduction The problem In the end there are 2 groups of friends: one group is {1, 2, 4, 5}, the other is {3}. The solution Let’s see how things will work with sets for the example of the problem. The groups will be represented by sets, and the representative of each group is the person with the biggest index. At the beginning there are 5 groups (sets): {1}, {2}, {3}, {4}, {5}. Nobody is anybody’s friend and everyone is the representative of his or her own group. The next step is that 1 and 2 become friends, this means the group containing 1 and the group with 2 will become one group. This will give us these groups: {1, 2} , {3}, {4}, {5}, and the representative of the first group will become 2. Next, 5 and 4 become friends. The groups will be {1,2}, {3}, {4, 5}. And in the last step 5 and 1 become friends and the groups will be {1, 2, 4, 5}, {3}. The representative of the first group will be 5 and the representative for second group will be 3. (We will see why we need representatives later). At the end we have 2 sets, the first set with 4 elements and the second with one, and this is the answer for the problem example: 2 groups, 1 group of 4 and one group of one. Perhaps now you are wondering how you can check if 2 persons are in the same group. This is where the use of the representative elements comes in. Let’s say we want to check if 3 and 2 are in the same group, we will know this if the representative of the set that contains 3 is the same as the representative of the set that contains 2. One representative is 5 and the other one is 3; therefore 3 and 2 aren’t in same groups of friends. Some operations
The solution using these operations Read N; for (each person x from 1 to N) CREATESET(x) for (each pair of friends (x y) ) if (FINDSET(x) != FINDSET(y)) MERGESETS(x, y) Now if we want to see if 2 persons (x, y) are in same group we check if FINDSET(x) == FINDSET(y). We will analyze the running time of the disjointset data structure in terms of N and M, where N is the number of times that CREATESET(x) is called and M is the total number of times that CREATESET(x), MERGESETS(x, y) and FINDSET(x) are called. Since the sets are disjoint, each time MERGESETS(x, y) is called one set will be created and two will be destroyed, giving us one less set. If there are n sets after n1 calls of MERGESETS(x,y) there will remain only one set. That’s why the number of MERGESETS(x,y) calls is less than or equal to the number of CREATESET(x) operations. Implementation with linked lists Now let’s see how to implement the MERGESETS(x, y) operations. The easy way is to append x’s list onto the end of y’s list. The representative of the new set is the representative of the original set that contained y. We must update the pointer to the representative for each element (object) originally on x’s list, which takes linear time in terms of the length of x’s list. It’s easy to prove that, in the worst case, the complexity of the algorithm will be O(M^2) where M is the number of operations MERGESETS(x, y). With this implementation the complexity will average O(N) per operation where N represents the number of elements in all sets. The “weighted union heuristic” So far we reach an algorithm to solve the problem in O(M + NlogN) where N is the number of persons and M is the number of friendships and a memory of O(N). Still the BFS will solve the problem in O(M + N) and memory of O(N + M). We can see that we have optimized the memory but not the execution time. Next step: root trees Step 1: nobody is anybody friend We have 5 trees and each tree has a single element, which is the root and the representative of that tree. Step 2: 1 and 2 are friends, MERGESETS(1, 2): The operation made is MERGESETS(1, 2). We have 4 trees one tree contain 2 elements and have the root 1. The other trees have a single element. Step 3: 5 and 4 are friends, MERGESETS(5, 4) The operation made is MERGESETS(5, 4). Now we have 3 trees, 2 trees with 2 elements and one tree with one element. Step 4: 5 and 1 are friends, MERGESETS(5, 1) The operation made is MERGESETS(5, 1). Now we have 2 trees, one tree has 4 elements and the other one has only one element. As we see so far the algorithm using rooted trees is no faster than the algorithm using linked lists. Two heuristics To implement a disjoint set forest with these heuristics, we must keep track of ranks. With each node x, we keep the integer value rank[x], which is bigger than or equal to the number of edges in the longest path between node x and a subleaf. When CREATESET(x) is called the initial rank[x] will be 0. When a MERGESETS(x, y) operation is made then the root of higher rank will become the parent of the root of lower rank – or, in case of tie, we arbitrarily choose one of the roots as the parent and increment its rank. Let’s see how the algorithm will look. Let P[x] = the parent of node x. CREATESET(x) P[x] = x rank[x] = 0 MERGESETS(x, y) PX = FINDSET(X) PY =FINDSET(Y) If (rank[PX] > rank[PY]) P[PY] = PX Else P[PX] = PY If (rank[PX] == rank[PY]) rank[PY] = rank[PY] + 1 And the last operation looks like: FINDSET(x) If (x != P[x]) p[x] = FINDSET(P[X]) Return P[X] Now let’s see how the heuristics helped the running time. If we use only the first heuristic “union by rank” then we will get the same running time we achieved with the weighted union heuristic when we used lists for representation. When we use both “union by rank” and “path compression,” the worst running time is O( m α(m,n)), where α(m,n) is the very slowly growing inverse of Ackermann’s function. In application α(m,n) <= 4 that’s why we can say that the running time is linear in terms of m, in practical situations. (For more details on Ackermann’s function or complexity see the references below.) Back to the problem Practice To practice what you've learned, try to solve GrafixMask – the Division 1 500 from SRM211. The idea is to keep track of all the blocks and consider each grid point as a node. Next, take all the nodes that aren’t blocked and let (x, y) be the coordinate of the left, right, down or up node, and if (x, y) is not blocked then you do the operation MERGESETS(node, node2). You should also try to determine how disjointset data structures can be used in the solution of RoadReconstruction from SRM 356. Disjointset data structures can also be used in TopographicalImage from SRM 210 and PathFinding, from SRM 156. I hope you find this data structure to be useful. Good luck in the Arena! References:


Home 
About TopCoder 
Press Room 
Contact Us 
Careers 
Privacy 
Terms
Competitions  Cockpit 
Copyright TopCoder, Inc. 2001 