Open-Source Internship opportunity by OpenGenus for programmers. Apply now.
In this post, we discuss the average height of a Random Binary Search Tree (BST) (that is 4.31107 ln(N) - 1.9531 lnln(N) + O(1)) by discussing various lemmas and their proofs. We omit full proofs and discuss the essential key points for easier understanding.
In short, Average Height of Random Binary Search Tree is:
HN = 4.31107 ln(N) - 1.9531 lnln(N) + O(1)
where N is the number of nodes in the Random Binary Search Tree and ln is natural logarithm.
For example, N = 100,000; then Average Height of Random Binary Search Tree is 54.40536 + O(1). The Height of a Balanced BST will be 11.5129.
Prerequisite: Average Height of Random Binary Tree (not BST)
Average Height of Random Binary Search Tree
A binary search tree is a data structure with the following properties.
- All nodes of left subtree are less than the root node
- All nodes of right subtree are more than the root node
- Both the left and right subtree are also have binary search trees.
The height of a random binary search tree on n nodes, constructed in the same manner from a random equiprobable permutation 1,...,n is known to be close to where α = 4.31107.
α = 4.31107 is the unique solution on [2], ∞) of the equation αln((2e)/α)=1.
ln is the natural logarithm and log is base 2 logarithm.
It was shown that almost surely as for some positive constant .
This constant cannot exceed α and it it shown that y = α as a consequence of the fact that
.
Experiments have shown that does not vary and seems to have a fixed range of width not depending upon n.
It is also proven that .
In this article we try to show that for , we have the following theorem.
Theorem 1:
and .
Remarks:
- Be the definition of α we obtain β=3α/(2α-2), however the definition above sheds more light on the reason β takes this value.
- It has also been proven that .
Model.
If we construct a binary search tree from a permutation of 1,...,n and i is the first key in the permutation then; i appears at the root of the tree.
The tree rooted at the left child of i contains keys 1,...,i-1 and its shape depends on the order in which these keys appear in the permutation.
From the above we deduce that is also the number of levels of recursion required when vanilla QuickSort(where 1st element is chosen as pivot) is applied to a random permutation of 1,...,n.
This observation allows for constructing from top down, where is a labeling of complete ∞ binary tree .
We expose the is the key associated with each node t of .
Now to highlight the relationship with QuickSort we refer to the key at t as the pivot element at t.
Suppose then that we have exposed the pivots for some of the nodes forming the subtree of rooted at root of .
Suppose further that for some node t of all of the ancestors of t are in and we have selected their pivots.
That means that these selected pivots determine the set of keys which will appear at the subtree of rooted at t, but have no effect on the order in which we expect the keys in to appear.
Note that each permutation of is equally likely, therefore each of the keys will have an equal chance of being the pivot.
Let be the number of keys in this set and specify the pivot at t by choosing a uniform element of 1,..., and by using the th largest element of as pivot at t. We actually chose a uniform element of [0..1] and set =[,], Therefore the subtree rooted at the left child of t has floor() nodes and the subtree rooted at the right child floor((1 - ) nodes.
We choose simultaneously and independently as follows;
Each node x of has a right son r(x) and left son l(x).
We consider a randomly labeled tree obtained from by choosing a uniform [0..1] random variable U(x) for each x of and labeling the edge (x, r(x)) by U(x) and the edge (x, l(x)) by 1-U(x).
The label of edge a is denoted by L(a).
Let be the random tree consisting of the first k edge levels of .
For each node y of we let f(y) be the product of the labels of the edges on the unique path from root to y.
If labels on the edges of the path from root to a node y of are L1,...,Li then we define
(y) = floor(...floor(floor(n))...)
Fact1: As shown above we can construct a random bst Tn from n nodes by taking a copy R of and letting consist of those nodes y of R with (y) ≥ 1.
By use of induction with the triviality a - 1 ≤ floor(a) ≤ a, it shows that contains .
Fact2: Let y be a node of at depth i(edge-distance i from root)
nf(y) - i ≤ (y) ≤ nf(y)
The above two facts suggest should be close to height of .
We shall discus and strengthen the results to show that they also apply to .
Theorem 2:
and
We use to denote the random set of nodes of at depth h.
to denote nf(t) which by facts 1 and 2 is close to .
We define the estimate of E()
Definition 1: Let = and αln n - βln ln n
The puzzle.
We want to compute for various n and h, P(|| > 0) that is, the probability that has height h is simply P(|| > 0) - P(|| > 0).
We consider E||).
There are nodes of at depth h, therefore we have E(||) = P(f(y) ≥ 1) where y is a node of at depth h.
For function f we use that for each x in -ln U(x) is an exponential random variable with mean 1. Therefore if y is at depth h in , then -ln (f) y is distributed as the sum of h i.i.d. exponential random variables with a mean of 1.(gama distributed with parameter h).
The results of this function allow for computing E||)
Definition 2:
Let = be the smallest h for which E(||) ≤ 1, for which P(f(y) ≥ ) ≤
Recall that α is the solution in (2, ∞) of αln(2e/α) = 1 therefore we have the following lemma.
Lemma 1:
= αln n - 1/2ln(α/2) lnlnn+O(1) and for some constant c1, E(||) > c1. Further there exists positive constants c2 and c3 such that
(a) for i = ≤ E(||) ≤
(b) for all positive i, E(||) ≤ .
Therefore, (c) for n sufficiently large and h = + , P( < 2|x ∈ ≥ 1/2
Remarks:
- We assume that n is arbitrarily large by increasing constants so we don't state it explicitly.
- Note =+ln n + O(1) and since is αln(n)(1 + o(1), it is equivalent to =++O(1).
and by the above lemma 1(a) for two constants d1 and d2, E() is between and
In proving this lemma we note that the density function for a gamma distribution with parameter h is
.
To obtain P(y ∈ ), we integrate this function from 0 to ln n.
To obtain P(≥ q2) we integrate this function from 0 to ln n - ln 2.
To obtain P(n0y∏q2) we integrate it from 0 to ln n - ln 2.
When x is below ln n and h is near , decreasing x by t decreases d(x) by a multiplicative factor of at least (1+o(1))exp(t(α - 1)).
In particular, d(x - ln2) < whereby proof for c comes easily.
This further shows that the main contribution to the integral for P(y ∈ ) comes from x near ln n and suggests to compute the ratio of the integral for h to h-1, we only consider the ratio between the corresponding density functions at this point.
The ratio is ln(n)/h which is about and since the tree levels differ by a factor of 2 (a) seems believable.
For each h near , || is not concentrated around its expected value and more strongly Var() is high as we shall see(expected value of is near rather than ).
To estimate Var(||) we need to determine for a typical x in , whether the pivots along Px which are biased towards 1 occur near the beginning or near the end of Px.
using the fact E(|| : x ∈ and labels on Px) = 1 +
This estimation is determined by
- How many on satisfy; f() is larger than it would be if all the labels on were identical
- By how much does f() exceed this "normal" value for such .
These answers are key to the proof for theorem 2.
Approach.
Consider a fixed node x at depth h in and generate labels on Px by
- Choose a set S of h random reals uniformly and independently from [0...1].
- Uniformly choose a random permutation l1...lh of S and letting li be the label on the ith edge of Px
Note:
- The event x ∈ is determined by 1
- The difference between f(zi) and its normal value is determined by 2
This independence allows for straightforward analysis to obtains results of the random permutation generated in 2.
for each i between l and h, let be the natural logarithm of 1/li.
Note that
Normalize the above by setting = - ()/h so = 0.
Let be the patial sum , then f() is above its normal value precisely if < 0.
We say x is good if x ∈ and every ≥ 0
The key to this proof is the following lemma
Lemma 2.
Let X = {x1,...,xh} be a set of real numbers whose sum is non-negative.
Let ,...,be a uniformly random permutation of X ,
then
P(∄j such that ) ≥ .
Furthermore, if X sums to zero, but none of its proper subsets do, then this probability is exactly 1/h.
Let be the set of good nodes at depth h.
Summing up the bound of lemma 2 over all choices of S in 1 for which x ∈ , we get
E(||) ≥ (1)
Steps 2 through 5 covered in the paper goes to show that E() lies between + O(1) and + O(1).
To prove lemma 2 we tighten upper an lower bounds. For the upper bound replace bound
P( ≠ ∅) ≤ E(||) = O(h) from lemma 1 by bound
P( ≠ ∅) = O() (6)
For the lower bound, we show that knowing x as a typically good node increases the expected number of good nodes by a constant rather than O(h).
Intuition why (6) holds.
Consider pairing each permutation 5 of a given S with its revers 5r.
We claim either,
- f() is at least its "normal" value when the labels on Px are given by Π or
- f() is atleast its "normal" value when the labels on are given by .
Note that f(zi) is above its "normal" value iff
≥
This claim implies that on average at least h/2 of the f(zi) are above their normal value.
To tighten the lower bound we show that for a typical good node, most of f(zi) are well below their normal value and hence the expected number of good nodes at depth is O(1) if a good node exists.
We show that for a good node, the majority of the partial sums for the random permutation of S are above zero.
Remark: for S chosen in (i) there is some proper subset of S which sums to 0 has a probability of zero.
Therefore we apply a stronger version of lemma 2 which holds for sets none of whose proper subsets sum to zero to obtains
the probability x is good given that it is in is exactly 1/h (7)
It can be shown for general sets S which sum to 0, the probability that every partial sum of a random permutation of S is non-negative is O(1/h) and more strongly that for a smaller a the probability every exceeds -a is near 1/h.
What is the probability of the event E that is 0 ?
if = 0, then the permutation breaks S into two subsets S1 and S2 both having elements and sum to 0.
Further, all partial sums for Π are non-negative so are partial sums for the permutation of consisting of the first half if Π and permutation of consisting of the second half of Π.
We can first generate a choice for (, ) and then generate and independently.
If S is well-behaved we can then show that the probability a random permutation of each Si stays above 0 is also O().
Thus the probability of E is O().
By lemma 2 the probability that all PSi for Π are non-negative is 1/h. Therefore the probability is 0 given all are non-negative is O().
Independent properties of conditioned exponentials
Results proved about random permutations hold for a large class of distributions.
We present properties and the resultant proof simplification that relies on the independent properties of the distribution that arise in the study of random binary search trees.
Let ,..., be independently chosen exponential random variables with mean of 1.
Let be their sum.
Let = (,...,) where = .
For 1 ≤ i ≤ n
Let = .
Property 1: is independent of
Property 2: is distributed like the spacings on [0...1] determined by n-1 independent uniform variables therefore every permutation of a given multi-set is equally likely to be .
Property 3: For i ≤ i < j ≤ n, if we condition = a and = b, then the set ( / (b - a),...,/(b - a)) is distributed like
Property 4: For any integer between 1 and n and real between 0 and 1, the probability that exceeds b is the probability that if we pick a subset B of a set A of n-1 elements by putting each element of A in B independently with probability b, we obtain |B| < a.
Property 5: The probability that there exists an i such that > 6ln n is less than .
Key ideas.
Let = - , to obtain we normalize such that the sum is 0.
Let = = - .
exc 0 , that is exceeds 0 if for all i we have ≥ 0.
Hence, for a non-negative integer a we say that exceeds -a, exc -a if for all i we have > .
By lemma 2 P( exc 0) =
Lemma 3:
For any positive integer a P( exc -a) = O()
For the proof we can assume that a is at least 3, and the lemma holds for 3, it will also hold for 1 and 2.
We assume that n is large and a < .
Consider .
Let Ev(bc) be the event that = and =
then if b, c > 0, given Ev(bc) we have,
(8)
Since the probability is independent of the choices for permutations of with i ≤ or i ≥ n + we have
For n > b > c > a: P( exc 0 | Ev(b,c)) ≥ P( exc -a) (9)
We show that the probability that Ev(b,c) holds for some choice of b and c with n > b > c > a is Ω(1).
Since P( exc 0) = .
The proof of this lemma shows that,
P((2a < < 3a)∩(a < < 2a)) = Ω(1). (10)
Using lemma 3 we can show that if exceeds 0 then it is above 0 most of the time.
Lemma 4:
For every integer a > 0, letting be the event.
P(( exc - a) ∩ ) = O()
Proof of theorem 2.
We show that is concentrated around = ceil(αln n βln ln n) using the 3 lemmas below.
Lemma 5: P( ≠ ∅) = Ω(1)
Lemma 6: There exists (universal) constants C1 > 2 and d > 1 such that for i > 0 with i = o() we have;
P( = ∅) < C1.
Lemma 7: There is a (universal) constant c2 > 2 such that for i > 0 we have.
P( ≠ ∅) <
Lemma 6 implies that for some constant C,
E() > - CC1 + o(1).
similarly lemma 7 implies that for some constant C
E() > - CC2 + o(1)
Therefore E() = + O(1)
Combining lemmas 6,7 yields Var() = O(1), since is O(1) for any b > 1.
As discussed we apply the proofs of the 3 lemmas above to the labels on the path from the root to a node x of the tree, this is because we can first expose the products of the labels to determine if x is in and then expose the actual values of the labels to check if x is good or determine other properties of the ordered set of labels on this path.
Since all 3 results are asymptotic we assume n is large.
We consider some node x at depth h of , the path =,... = x from the root of to x, and the nodes ,..,.
Let be the negative of the natural logarithm if the label on the edge into .
Let be .
Let Π (= Π(x)) be the permutation given by the labeling of .
Let σh =
Let =
Therefore for a positive real a we say Π is above -a(written Π > -a), if each is.
We say Π is above o (Π > 0) if for all i ≥ 0.
We say x is good if Π is above 0;
We say x is above -a if Π is.
To expose
- Choose σh according to the appropriate probability distribution.
- Choose as in property 2.
- Set = .
Note: = .
Therefore P(Π > -a) is simply
P( exc ) and P(x is good) is P( exc 0).
Consider h ≃ αln n and ≃ αln n.
Therefore h/ ≃ α.
In particular we consider choices for h and such that h < 5
Lemma 3 applies in such a case
Conditioned on a choice of less than 5h.
P(Π > -a) = P( exc ) < P( exc - 5a) = O() 14
A stronger result.
Since is at least as tall as we have an immediate corollary of lemma 7.
Lemma 8:
There is (universal) constant C2 such that for i > 0 with i = o(logn) we have
P( ≥ + i) <
Lemma 9:
There is a (universal) constant such that the probability the contains a node at height - is Ω(1).
Lemma 10:
There exists (universal) constants , and d < 1 such that for i > 0 with i = o() we have
P( < - - i) <
this lemma comes about by mimicking proofs of lemma 6 using lemma 9 in the proof of lemma 5.
Theorem 1 follows from these results same as theorem 2 from lemmas 5, 6, 7.
References
The Height of a Random Binary Search Tree by Bruce Reed from McGill University, Montreal Quebec, Canada and CNRS, Paris, France.