Next: Proof of theorem 4 Up: Appendix A. Proofs Previous: Validation of algorithm solve_constraints2

## Proof of Theorem 3

We begin by proving lemma 22, that the revised version of ``instantiate'', given in section 6.3, gives an instantiation of a cluster tree in Euclidean space.

Lemma 22: Any cluster tree T has an instantiation in Euclidean space Rm of any dimensionality m.

The proof is essentially the same as the proof of Lemma 2, except that we now have to keep track of real quantities. For any node N, if i=N.label, we define d(N) = di. The proof then proceeds in the following steps:

i.
For any i< j, di < dj / aj-i. Immediate by construction.

ii.
For any nodes M,C, if M is a descendant of C in T then dist(G[M],G[C]) < a n d(C) / (a - 1).

Proof: Let C=C0, C1 ... Cr=M be the path from C to M through T. Then
dist(G[M],G[C]) <= (by the triangle inequality)
the sum over i from 0 to r-1 of dist(G[Ci+1],G[Ci]) <=
the sum over i from 0 to r-1 of (n d(C) / ai) < (a/(a-1)) (n d(C)).

iii.
Let N be a node in T; let C1 and C2 be two children of N; and let M1 and M2 be descendants of C1 and C2 respectively. Then d(N) (1 - 2n/(a-1)) < dist( G[M1], G[M2]) < nd(N) (1 + 2/(a-1))
Proof: By the triangle inequality,
dist( G[C1], G[C2]) <=dist( G[C1],G[M1]) + dist( G[M1],G[M2]) + dist( G[M2],G[C2]).
Thus, dist( G[C1], G[C2]) -dist( G[C1],G[M1]) -dist( G[M2],G[C2]) <=dist( G[M1],G[M2]).
Also, by the triangle inequality,
dist( G[M1],G[M2]) <=dist( G[C1], G[C2]) + dist( G[C1],G[M1]) + dist( G[M2],G[C2]).
By construction, d(N) <= dist( G[C1],G[C2]) < n d(N),
and by part (ii), for i = 1,2, dist( G[Mi],G[Ci]) < a n d(C) / (a - 1) < n d(N) / (a -1)
as d(C) < d(N) / a.

iv.
For any symbols a,b,c,d in T, let P be the least common ancestor of a,b and let N be the least common ancestor of c,d. If P.label < N.label then much_closer( G[a],G[b],G[c],G[d]).

Proof: By part (iii), dist(G[a],G[b]) < nd(P) (1 + 2/(a-1)) and dist(G[c],G[d]) > d(N) (1 - 2n/(a -1)). Since d(P) < d(N) / a and since a = 2+2n+Bn, it follows by straightforward algebra that dist(G[a],G[b]) < dist(G[c],G[d]) / B.

We next prove the analogue of lemma 5.

Lemma 23:  Let S be a set of constraints over n variables of the form ``dist(a,b) < dist(c,d) / B'', where B > n. If S is consistent, then there is some edge in S which is not in the connected components of the shorts of S.

Proof: Let Z be a valuation satisfying S. Let pq be the edge in S for which dist(Z(p),Z(q)) is maximal. Now, if ab is a short of S -- that is, there is a constraint much_closer(a,b,c,d) in S -- then dist( Z(a), Z(b)) < dist( Z(c), Z(d))/B <= dist( Z(p), Z(q))/B.

Now, let ab be any edge in H, the connected components of the shorts of S. Then there is a simple path a1=a, a2 ... ak =b such that the edge aiai+1 is a short of S for i=1 ... k-1. Note that k <= n. Then, by the triangle inequality,

dist( Z(a), Z(b)) <=
dist( Z(a1),Z(a2)) + dist( Z(a2),Z(a3)) + ... + dist( Z(ak-1),Z(ak)) <=
(k-1)dist( Z(p), Z(q)) / B < dist( Z(p), Z(q))
Hence pq <> ab, so pq is not in H.

Theorem 3:  Let S be a set of constraints over n variables of the form ``dist(a,b) < dist(c,d) / B'', where B > n. The algorithm solve_constraints(S) returns a cluster tree satisfying S if S is consistent over Euclidean space, and returns false if S is inconsistent.

Proof: Note that the semantics of the constraints ``much_closer(a,b,c,d)'' enters into the proof of Theorem 1 only in lemmas 2 and 5. The remainder of the proof of Theorem 1 has to do purely with the relation between the structure of S and the structure of the tree. Hence, since we have shown that the analogues of lemmas 2 and 5 hold in a set of constraints of this kind, the same proof can be completed in exactly the same way.

Next: Proof of theorem 4 Up: Appendix A. Proofs Previous: Validation of algorithm solve_constraints2