Search anything:

Average Height of Random Binary Search Tree

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this post, we discuss the average height of a Random Binary Search Tree (BST) (that is 4.31107 ln(N) - 1.9531 lnln(N) + O(1)) by discussing various lemmas and their proofs. We omit full proofs and discuss the essential key points for easier understanding.

In short, Average Height of Random Binary Search Tree is:

HN = 4.31107 ln(N) - 1.9531 lnln(N) + O(1)

where N is the number of nodes in the Random Binary Search Tree and ln is natural logarithm.

For example, N = 100,000; then Average Height of Random Binary Search Tree is 54.40536 + O(1). The Height of a Balanced BST will be 11.5129.

Prerequisite: Average Height of Random Binary Tree (not BST)

Average Height of Random Binary Search Tree

A binary search tree is a data structure with the following properties.

  • All nodes of left subtree are less than the root node
  • All nodes of right subtree are more than the root node
  • Both the left and right subtree are also have binary search trees.

The height Hnα of a random binary search tree Tn on n nodes, constructed in the same manner from a random equiprobable permutation 1,...,n is known to be close to αln where α = 4.31107.
α = 4.31107 is the unique solution on [2], ∞) of the equation αln((2e)/α)=1.
ln is the natural logarithm and log is base 2 logarithm.

It was shown that Hn/ln nY almost surely as n for some positive constant Y.
This constant cannot exceed α and it it shown that y = α as a consequence of the fact that
E(Hn)αln n.
Experiments have shown that Hn does not vary and seems to have a fixed range of width not depending upon n.
It is also proven that Var(Hn)=O((ln ln n)2).

In this article we try to show that for β=32ln(α/2), we have the following theorem.

Theorem 1:
E(Hn)=αln n - βln ln n + O(1) and Var(Hn)=O(1).


  1. Be the definition of α we obtain β=3α/(2α-2), however the definition above sheds more light on the reason β takes this value.
  2. It has also been proven that Var(Hn)=O(1).


If we construct a binary search tree from a permutation of 1,...,n and i is the first key in the permutation then; i appears at the root of the tree.
The tree rooted at the left child of i contains keys 1,...,i-1 and its shape depends on the order in which these keys appear in the permutation.

From the above we deduce that Hn is also the number of levels of recursion required when vanilla QuickSort(where 1st element is chosen as pivot) is applied to a random permutation of 1,...,n.

This observation allows for constructing Tn from top down, where Tn is a labeling of complete ∞ binary tree T.

We expose the is the key associated with each node t of Tn.
Now to highlight the relationship with QuickSort we refer to the key at t as the pivot element at t.

Suppose then that we have exposed the pivots for some of the nodes forming the subtree of T rooted at root of T.
Suppose further that for some node t of T all of the ancestors of t are in Tn and we have selected their pivots.

That means that these selected pivots determine the set of keys Kt which will appear at the subtree of Tn rooted at t, but have no effect on the order in which we expect the keys in Kt to appear.

Note that each permutation of Kt is equally likely, therefore each of the keys Kt will have an equal chance of being the pivot.

Let nt be the number of keys in this set and specify the pivot at t by choosing a uniform element it of 1,...,nt and by using the itth largest element of Kt as pivot at t. We actually chose a uniform element ut of [0..1] and set it=[nt,ut], Therefore the subtree rooted at the left child of t has floor(ntut) nodes and the subtree rooted at the right child floor(nt(1 - ut) nodes.

We choose ut simultaneously and independently as follows;
Each node x of T has a right son r(x) and left son l(x).
We consider a randomly labeled tree R obtained from T by choosing a uniform [0..1] random variable U(x) for each x of T and labeling the edge (x, r(x)) by U(x) and the edge (x, l(x)) by 1-U(x).
The label of edge a is denoted by L(a).
Let Rk be the random tree consisting of the first k edge levels of R.

For each node y of R we let f(y) be the product of the labels of the edges on the unique path from root to y.
If labels on the edges of the path from root to a node y of R are L1,...,Li then we define

hn(y) = floor(...floor(floor(nL1)L2)...Li)

Fact1: As shown above we can construct a random bst Tn from n nodes by taking a copy R of R and letting Tn consist of those nodes y of R with hn(y) ≥ 1.
By use of induction with the triviality a - 1 ≤ floor(a) ≤ a, it shows that T'n contains Tn.

Fact2: Let y be a node of R at depth i(edge-distance i from root)

nf(y) - i ≤ hn(y) ≤ nf(y)

The above two facts suggest Hn should be close to height H'n of T'n.

We shall discus H'n and strengthen the results to show that they also apply to Hn.

Theorem 2:
E(H'n)=αln n-βln ln n + O(1) and Var(H'n)=O(1)

We use Xn,h to denote the random set of nodes of T'n at depth h.
n't to denote nf(t) which by facts 1 and 2 is close to nt.
We define the estimate of E(Hn)

Definition 1: Let h' = h'n and αln n - βln ln n

The puzzle.

We want to compute for various n and h, P(|Xn,h| > 0) that is, the probability that T'n has height h is simply P(|Xn,h| > 0) - P(|Xn,h+1| > 0).
We consider E|Xn,h|).
There are 2h nodes of T at depth h, therefore we have E(|Xn,h|) = 2hP(f(y) ≥ 1) where y is a node of R at depth h.
For function f we use that for each x in R -ln U(x) is an exponential random variable with mean 1. Therefore if y is at depth h in T, then -ln (f) y is distributed as the sum of h i.i.d. exponential random variables with a mean of 1.(gama distributed with parameter h).
The results of this function allow for computing E|Xn,h|)

Definition 2:
Let hx = hnx be the smallest h for which E(|Xn,h|) ≤ 1, for which P(f(y) ≥ 1n) ≤ 2-h

Recall that α is the solution in (2, ∞) of αln(2e/α) = 1 therefore we have the following lemma.

Lemma 1:

hx = αln n - 1/2ln(α/2) lnlnn+O(1) and for some constant c1, E(|Xn,hx|) > c1. Further there exists positive constants c2 and c3 such that
(a) for i = o(ln n),c2(2/α)i ≤ E(|Xn,hx+i|) ≤ c3(2/α)i
(b) for all positive i, E(|Xn,hx+i|) ≤ c32-i.
Therefore, (c) for n sufficiently large and h = hxn + o(ln n), P(n'x < 2|x ∈ Xn,h ≥ 1/2


  1. We assume that n is arbitrarily large by increasing constants so we don't state it explicitly.
  2. Note hnx=hn'+logα/2ln n + O(1) and since hnx is αln(n)(1 + o(1), it is equivalent to hnx=hn'+logα/2hnx+O(1).
    and by the above lemma 1(a) for two constants d1 and d2, E(Xn,hn') is between d1hx and d2hx

In proving this lemma we note that the density function for a gamma distribution with parameter h is
To obtain P(y ∈ Xn,h ), we integrate this function from 0 to ln n.
To obtain P(n'y≥ q2) we integrate this function from 0 to ln n - ln 2.
To obtain P(n0y∏q2) we integrate it from 0 to ln n - ln 2.
When x is below ln n and h is near hnx, decreasing x by t decreases d(x) by a multiplicative factor of at least (1+o(1))exp(t(α - 1)).
In particular, d(x - ln2) < 14d(x) whereby proof for c comes easily.
This further shows that the main contribution to the integral for P(y ∈ Xn,h) comes from x near ln n and suggests to compute the ratio of the integral for h to h-1, we only consider the ratio between the corresponding density functions at this point.
The ratio is ln(n)/h which is about α-1 and since the tree levels differ by a factor of 2 (a) seems believable.

For each h near hx, |Xn,h| is not concentrated around its expected value and more strongly Var(Xn,h) is high as we shall see(expected value of Hn' is near hn' rather than hnx).

To estimate Var(|Xn,h|) we need to determine for a typical x in Xn,h, whether the pivots along Px which are biased towards 1 occur near the beginning or near the end of Px.

using the fact E(|Xn,h| : x ∈ Tn' and labels on Px) = 1 + i=0h-1=E(|Xnf(zt'),h-i|)

This estimation is determined by

  1. How many zi on Px satisfy; f(zi) is larger than it would be if all the labels on Px were identical
  2. By how much does f(zi) exceed this "normal" value for such zi.

These answers are key to the proof for theorem 2.


Consider a fixed node x at depth h in T and generate labels on Px by

  1. Choose a set S of h random reals uniformly and independently from [0...1].
  2. Uniformly choose a random permutation l1...lh of S and letting li be the label on the ith edge of Px


  • The event x ∈ Tn' is determined by 1
  • The difference between f(zi) and its normal value is determined by 2

This independence allows for straightforward analysis to obtains results of the random permutation generated in 2.

for each i between l and h, let at' be the natural logarithm of 1/li.
Note that xTn'i=1ha'iln n
Normalize the above by setting ai = a'i - (i=1ha'i)/h so i=1hai = 0.
Let PSi be the patial sum j=1iaj, then f(zi) is above its normal value precisely if PSi < 0.
We say x is good if x ∈ Tn' and every PSi ≥ 0

The key to this proof is the following lemma

Lemma 2.

Let X = {x1,...,xh} be a set of real numbers whose sum is non-negative.
Let xπ(1),...,xπ(h)be a uniformly random permutation of X ,
P(∄j such that i=1jxπ(i)<0) ≥ 1h.
Furthermore, if X sums to zero, but none of its proper subsets do, then this probability is exactly 1/h.

Let Xn,h' be the set of good nodes at depth h.
Summing up the bound of lemma 2 over all choices of S in 1 for which x ∈ Tn', we get

E(|Xn,h'|) ≥ E(Xn,h')h (1)

Steps 2 through 5 covered in the paper goes to show that E(Hn') lies between hx + O(1) and h- + O(1).

To prove lemma 2 we tighten upper an lower bounds. For the upper bound replace bound
P(Xx,h'+i ≠ ∅) ≤ E(|Xx,h'+i|) = O(h2-i) from lemma 1 by bound
P(Xx,h'+i ≠ ∅) = O(2-i2) (6)

For the lower bound, we show that knowing x as a typically good node increases the expected number of good nodes by a constant rather than O(h).

Intuition why (6) holds.
Consider pairing each permutation 5 of a given S with its revers 5r.
We claim either,

  1. f(zi) is at least its "normal" value when the labels on Px are given by Π or
  2. f(zh-i) is atleast its "normal" value when the labels on Px are given by Πr.

Note that f(zi) is above its "normal" value iff


This claim implies that on average at least h/2 of the f(zi) are above their normal value.

To tighten the lower bound we show that for a typical good node, most of f(zi) are well below their normal value and hence the expected number of good nodes at depth h' is O(1) if a good node exists.

We show that for a good node, the majority of the partial sums for the random permutation of S are above zero.

Remark: for S chosen in (i) there is some proper subset of S which sums to 0 has a probability of zero.

Therefore we apply a stronger version of lemma 2 which holds for sets none of whose proper subsets sum to zero to obtains

the probability x is good given that it is in Xn,h is exactly 1/h (7)

It can be shown for general sets S which sum to 0, the probability that every partial sum PSi of a random permutation of S is non-negative is O(1/h) and more strongly that for a smaller a the probability every PSi exceeds -a is near 1/h.

What is the probability of the event E that PSi is 0 ?
if PSh2 = 0, then the permutation Π breaks S into two subsets S1 and S2 both having h2 elements and sum to 0.

Further, all partial sums for Π are non-negative so are partial sums for the permutation Π1 of S1 consisting of the first half if Π and permutation Π2 of S2 consisting of the second half of Π.

We can first generate a choice for (S1, S2) and then generate Π1 and Π2 independently.

If S is well-behaved we can then show that the probability a random permutation of each Si stays above 0 is also O(1h).

Thus the probability of E is O((h2)2).

By lemma 2 the probability that all PSi for Π are non-negative is 1/h. Therefore the probability PSh2 is 0 given all PSi are non-negative is O(1h).

Independent properties of conditioned exponentials

Results proved about random permutations hold for a large class of distributions.
We present properties and the resultant proof simplification that relies on the independent properties of the distribution that arise in the study of random binary search trees.

Let E1,...,En be independently chosen exponential random variables with mean of 1.

Let σn be their sum.
Let Dn = (d1,...,dn) where di = Eiσn.
For 1 ≤ i ≤ n
Let Ti = j=1i dj.

Property 1: σn is independent of Dn

Property 2: Dn is distributed like the spacings on [0...1] determined by n-1 independent uniform variables therefore every permutation of a given multi-set is equally likely to be Dn.

Property 3: For i ≤ i < j ≤ n, if we condition Ti = a and Tj = b, then the set (di+1 / (b - a),...,dj/(b - a)) is distributed like Dj-i

Property 4: For any integer between 1 and n and real between 0 and 1, the probability that Ta exceeds b is the probability that if we pick a subset B of a set A of n-1 elements by putting each element of A in B independently with probability b, we obtain |B| < a.

Property 5: The probability that there exists an i such that Ei > 6ln n is less than 1n5.

Key ideas.

Let yi = di - 1n, to obtain yi we normalize di such that the sum is 0.

Let Ui = j=1i yj = Ti - in.
Dn exc 0 , that is Dn exceeds 0 if for all i we have Ui ≥ 0.
Hence, for a non-negative integer a we say that Dn exceeds -a, Di exc -a if for all i we have Ui > -an.
By lemma 2 P(Di exc 0) = 1n

Lemma 3:

For any positive integer a P(Dn exc -a) = O(a4n)

For the proof we can assume that a is at least 3, and the lemma holds for 3, it will also hold for 1 and 2.
We assume that n is large and a < n14.
Consider Dn+2a2.
Let Ev(bc) be the event that Ua2 = bn+2a2 and Un+a2 = cn+2a2

then if b, c > 0, given Ev(bc) we have,

P((Ui>0ia2)(Ui>0in+a2))a-4 (8)

Since the probability is independent of the choices for permutations of yi with i ≤ a2 or i ≥ n + a2 we have

For n > b > c > a: P(Dn+2a2 exc 0 | Ev(b,c)) ≥ 1a4P(Dn exc -a) (9)

We show that the probability that Ev(b,c) holds for some choice of b and c with n > b > c > a is Ω(1).
Since P(Dn+2a2 exc 0) = 1n+2a2.

The proof of this lemma shows that,

P((2a < Ua2 < 3a)∩(a < Un+a2 < 2a)) = Ω(1). (10)

Using lemma 3 we can show that if Dn exceeds 0 then it is above 0 most of the time.

Lemma 4:

For every integer a > 0, letting Ra be the event.

P((Dn exc - a) ∩ Ra) = O(1na2)

Proof of theorem 2.
We show that Hn' is concentrated around hn' = ceil(αln n βln ln n) using the 3 lemmas below.

Lemma 5: P(Xn,hn' ≠ ∅) = Ω(1)

Lemma 6: There exists (universal) constants C1 > 2 and d > 1 such that for i > 0 with i = o(ln n) we have;

P(Xn,hn'-i = ∅) < C1d-i.

Lemma 7: There is a (universal) constant c2 > 2 such that for i > 0 we have.
P(Xn,hn'-i ≠ ∅) < C22-i/2

Lemma 6 implies that for some constant C,

E(Hn') > hn' - ClogdC1 + o(1).

similarly lemma 7 implies that for some constant C

E(Hn') > hn' - ClogdC2 + o(1)

Therefore E(Hn') = hn' + O(1)

Combining lemmas 6,7 yields Var(Hn') = O(1), since bii2 is O(1) for any b > 1.

As discussed we apply the proofs of the 3 lemmas above to the labels on the path from the root to a node x of the tree, this is because we can first expose the products of the labels to determine if x is in Tn' and then expose the actual values of the labels to check if x is good or determine other properties of the ordered set of labels on this path.

Since all 3 results are asymptotic we assume n is large.

We consider some node x at depth h of T, the path Px=zo,...zh = x from the root of T to x, and the nodes z0',..,zh-1'.

Let ai' be the negative of the natural logarithm if the label on the edge into zi.

Let ai be ai'.
Let Π (= Π(x)) be the permutation ai given by the labeling of Px.
Let σh = j=1haj
Let PSi=j=1haj
Therefore for a positive real a we say Π is above -a(written Π > -a), if each PSi is.
We say Π is above o (Π > 0) if for all i PSi ≥ 0.
We say x is good if Π is above 0;
We say x is above -a if Π is.
To expose ai'

  1. Choose σh according to the appropriate probability distribution.
  2. Choose Dh as in property 2.
  3. Set ai' = diσh.

Note: PSi = σhUi.
Therefore P(Π > -a) is simply

P(Dh exc -ahσh) and P(x is good) is P(Dh exc 0).

Consider h ≃ αln n and σh ≃ αln n.

Therefore h/σh ≃ α.

In particular we consider choices for h and σh such that h < 5σh
Lemma 3 applies in such a case

Conditioned on a choice of σh less than 5h.

P(Π > -a) = P(Dh exc -ahσh) < P(Dh exc - 5a) = O(a4h) 14

A stronger result.

Since Tn' is at least as tall as Tn we have an immediate corollary of lemma 7.

Lemma 8:

There is (universal) constant C2 such that for i > 0 with i = o(logn) we have

P(Hnhn' + i) < C22-i/2

Lemma 9:

There is a (universal) constant C3 such that the probability the Tn contains a node at height hn' - C3 is Ω(1).

Lemma 10:

There exists (universal) constants C1, C3 and d < 1 such that for i > 0 with i = o(logn) we have

P(Hn < hn' - C3 - i) < C1di

this lemma comes about by mimicking proofs of lemma 6 using lemma 9 in the proof of lemma 5.

Theorem 1 follows from these results same as theorem 2 from lemmas 5, 6, 7.


The Height of a Random Binary Search Tree by Bruce Reed from McGill University, Montreal Quebec, Canada and CNRS, Paris, France.

Average Height of Random Binary Search Tree
Share this