# Average Height of Random Binary Search Tree

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this post, we discuss the average height of a Random Binary Search Tree (BST) (that is 4.31107 ln(N) - 1.9531 lnln(N) + O(1)) by discussing various lemmas and their proofs. We omit full proofs and discuss the essential key points for easier understanding.

In short, Average Height of Random Binary Search Tree is:

#### H_{N} = 4.31107 ln(N) - 1.9531 lnln(N) + O(1)

where N is the number of nodes in the Random Binary Search Tree and ln is natural logarithm.

For example, N = 100,000; then Average Height of Random Binary Search Tree is **54.40536 + O(1)**. The Height of a Balanced BST will be **11.5129**.

Prerequisite: Average Height of Random Binary Tree (not BST)

## Average Height of Random Binary Search Tree

A binary search tree is a data structure with the following properties.

- All nodes of left subtree are less than the root node
- All nodes of right subtree are more than the root node
- Both the left and right subtree are also have binary search trees.

The height ${\mathrm{H}}_{\mathrm{n}}\phantom{\rule{0ex}{0ex}}\mathrm{\alpha}$ of a random binary search tree ${\mathrm{T}}_{\mathrm{n}}$ on n nodes, constructed in the same manner from a random equiprobable permutation 1,...,n is known to be close to $\mathrm{\alpha}ln$ where α = 4.31107.

α = 4.31107 is the unique solution on [2], ∞) of the equation *αln((2e)/α)=1*.

*ln* is the natural logarithm and log is base 2 logarithm.

It was shown that ${\mathrm{H}}_{\mathrm{n}}/ln\mathrm{n}\to \mathcal{Y}$ almost surely as $\mathrm{n}\to \infty $ for some positive constant $\mathcal{Y}$.

This constant cannot exceed α and it it shown that y = α as a consequence of the fact that

$\mathrm{E}\left({\mathrm{H}}_{\mathrm{n}}\right)\sim \mathrm{\alpha}ln\mathrm{n}$.

Experiments have shown that ${\mathrm{H}}_{\mathrm{n}}$ does not vary and seems to have a fixed range of width not depending upon n.

It is also proven that $\mathrm{Var}\left({\mathrm{H}}_{\mathrm{n}}\right)=\mathrm{O}\left(\right(lnln\mathrm{n}{)}^{2})$.

In this article we try to show that for $\mathrm{\beta}=\frac{3}{2ln(\mathrm{\alpha}/2)}$, we have the following theorem.

**Theorem 1:**

$\mathrm{E}\left({\mathrm{H}}_{\mathrm{n}}\right)=\mathrm{\alpha}ln\mathrm{n}-\mathrm{\beta}l\mathrm{n}ln\mathrm{n}+\mathrm{O}\left(1\right)$and $\mathrm{Var}\left({\mathrm{H}}_{\mathrm{n}}\right)=O\left(1\right)$.

**Remarks:**

- Be the definition of α we obtain β=3α/(2α-2), however the definition above sheds more light on the reason β takes this value.
- It has also been proven that $\mathrm{Var}\left({\mathrm{H}}_{\mathrm{n}}\right)=O\left(1\right)$.

### Model.

If we construct a binary search tree from a permutation of 1,...,n and *i* is the first key in the permutation then; *i* appears at the root of the tree.

The tree rooted at the left child of *i* contains keys 1,...,*i*-1 and its shape depends on the order in which these keys appear in the permutation.

From the above we deduce that ${\mathrm{H}}_{\mathrm{n}}$ is also the number of levels of recursion required when vanilla QuickSort(where 1st element is chosen as pivot) is applied to a random permutation of 1,...,n.

This observation allows for constructing ${\mathrm{T}}_{\mathrm{n}}$ from top down, where ${\mathrm{T}}_{\mathrm{n}}$ is a labeling of complete ∞ binary tree ${\mathrm{T}}_{\infty}$.

We expose the is the key associated with each node t of ${\mathrm{T}}_{\mathrm{n}}$.

Now to highlight the relationship with QuickSort we refer to the key at t as the *pivot* element at t.

Suppose then that we have exposed the pivots for some of the nodes forming the subtree of ${\mathrm{T}}_{\infty}$ rooted at root of ${\mathrm{T}}_{\infty}$.

Suppose further that for some node t of ${\mathrm{T}}_{\infty}$ all of the ancestors of t are in ${\mathrm{T}}_{\mathrm{n}}$ and we have selected their pivots.

That means that these selected pivots determine the set of keys ${\mathrm{K}}_{\mathrm{t}}$ which will appear at the subtree of ${\mathrm{T}}_{\mathrm{n}}$ rooted at t, but have no effect on the order in which we expect the keys in ${\mathrm{K}}_{\mathrm{t}}$ to appear.

Note that each permutation of ${\mathrm{K}}_{\mathrm{t}}$ is equally likely, therefore each of the keys ${\mathrm{K}}_{\mathrm{t}}$ will have an equal chance of being the pivot.

Let ${\mathrm{n}}_{\mathrm{t}}$ be the number of keys in this set and specify the pivot at t by choosing a uniform element ${\mathrm{i}}_{\mathrm{t}}$ of 1,...,${\mathrm{n}}_{\mathrm{t}}$ and by using the ${\mathrm{i}}_{\mathrm{t}}$th largest element of ${\mathrm{K}}_{\mathrm{t}}$ as pivot at t. We actually chose a uniform element ${\mathrm{u}}_{\mathrm{t}}$ of [0..1] and set ${\mathrm{i}}_{\mathrm{t}}$=[${\mathrm{n}}_{\mathrm{t}}$,${\mathrm{u}}_{\mathrm{t}}$], Therefore the subtree rooted at the left child of t has floor(${\mathrm{n}}_{\mathrm{t}}{\mathrm{u}}_{\mathrm{t}}$) nodes and the subtree rooted at the right child floor(${\mathrm{n}}_{\mathrm{t}}$(1 - ${\mathrm{u}}_{\mathrm{t}}$) nodes.

We choose ${\mathrm{u}}_{\mathrm{t}}$ simultaneously and independently as follows;

Each node x of ${\mathrm{T}}_{\infty}$ has a right son r(x) and left son l(x).

We consider a randomly labeled tree ${\mathrm{R}}_{\infty}$ obtained from ${\mathrm{T}}_{\infty}$ by choosing a uniform [0..1] random variable U(x) for each x of ${\mathrm{T}}_{\infty}$ and labeling the edge (x, r(x)) by U(x) and the edge (x, l(x)) by 1-U(x).

The label of edge a is denoted by L(a).

Let ${\mathrm{R}}_{\mathrm{k}}$ be the random tree consisting of the first k edge levels of ${\mathrm{R}}_{\infty}$.

For each node y of ${\mathrm{R}}_{\infty}$ we let f(y) be the product of the labels of the edges on the unique path from root to y.

If labels on the edges of the path from root to a node y of ${\mathrm{R}}_{\infty}$ are L1,...,Li then we define

${\mathrm{h}}_{\mathrm{n}}$(y) = floor(...floor(floor(n${\mathrm{L}}_{1}$)${\mathrm{L}}_{2}$)...${\mathrm{L}}_{\mathrm{i}}$)

* Fact1:* As shown above we can construct a random bst Tn from n nodes by taking a copy R of ${\mathrm{R}}_{\infty}$ and letting ${\mathrm{T}}_{\mathrm{n}}$ consist of those nodes y of R with ${\mathrm{h}}_{\mathrm{n}}$(y) ≥ 1.

By use of induction with the triviality a - 1 ≤ floor(a) ≤ a, it shows that ${{\mathrm{T}}^{\text{'}}}_{\mathrm{n}}$ contains ${\mathrm{T}}_{\mathrm{n}}$.

* Fact2:* Let y be a node of ${\mathrm{R}}_{\infty}$ at depth i(edge-distance i from root)

nf(y) - i ≤ ${\mathrm{h}}_{\mathrm{n}}$(y) ≤ nf(y)

The above two facts suggest ${\mathrm{H}}_{\mathrm{n}}$ should be close to height ${{\mathrm{H}}^{\text{'}}}_{\mathrm{n}}$ of ${{\mathrm{T}}^{\text{'}}}_{\mathrm{n}}$.

We shall discus ${{\mathrm{H}}^{\text{'}}}_{\mathrm{n}}$ and strengthen the results to show that they also apply to ${\mathrm{H}}_{\mathrm{n}}$.

**Theorem 2:**

$\mathrm{E}\left({{\mathrm{H}}^{\text{'}}}_{\mathrm{n}}\right)=\mathrm{\alpha}ln\mathrm{n}-\mathrm{\beta}l\mathrm{n}ln\mathrm{n}+\mathrm{O}\left(1\right)$ and $\mathrm{Var}\left({{\mathrm{H}}^{\text{'}}}_{\mathrm{n}}\right)=O\left(1\right)$

We use ${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$ to denote the random set of nodes of ${{\mathrm{T}}^{\text{'}}}_{\mathrm{n}}$ at depth h.

${{\mathrm{n}}^{\text{'}}}_{\mathrm{t}}$ to denote nf(t) which by facts 1 and 2 is close to ${\mathrm{n}}_{\mathrm{t}}$.

We define the estimate of E(${\mathrm{H}}_{\mathrm{n}}$)

* Definition 1:* Let ${\mathrm{h}}^{\text{'}}$ = ${{\mathrm{h}}^{\text{'}}}_{\mathrm{n}}$ and αln n - βln ln n

### The puzzle.

We want to compute for various n and h, P(|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$| > 0) that is, the probability that ${{\mathrm{T}}^{\text{'}}}_{\mathrm{n}}$ has height h is simply P(|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$| > 0) - P(|${\mathrm{X}}_{\mathrm{n},\mathrm{h+1}}$| > 0).

We consider E|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$|).

There are ${2}^{\mathrm{h}}$ nodes of ${\mathrm{T}}_{\infty}$ at depth h, therefore we have E(|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$|) = ${2}^{\mathrm{h}}$P(f(y) ≥ 1) where y is a node of ${\mathrm{R}}_{\infty}$ at depth h.

For function f we use that for each x in ${\mathrm{R}}_{\infty}$ -ln U(x) is an exponential random variable with mean 1. Therefore if y is at depth h in ${\mathrm{T}}_{\infty}$, then -ln (f) y is distributed as the sum of h i.i.d. exponential random variables with a mean of 1.(gama distributed with parameter h).

The results of this function allow for computing E|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$|)

**Definition 2:**

Let ${\mathrm{h}}^{x}$ = ${\mathrm{h}}_{\mathrm{n}}^{\mathrm{x}}$ be the smallest h for which E(|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$|) ≤ 1, for which P(f(y) ≥ $\frac{1}{n}$) ≤ ${2}^{-\mathrm{h}}$

Recall that α is the solution in (2, ∞) of αln(2e/α) = 1 therefore we have the following lemma.

### Lemma 1:

${\mathrm{h}}^{x}$ = αln n - 1/2ln(α/2) lnlnn+O(1) and for some constant c1, E(|${\mathrm{X}}_{\mathrm{n},{\mathrm{h}}^{\mathrm{x}}}$|) > c1. Further there exists positive constants c2 and c3 such that

(a) for i = $\mathrm{o}\left(\sqrt{\mathrm{ln}\mathrm{n}}\right),c2(2/\mathrm{\alpha}{)}^{\mathrm{i}}$ ≤ E(|${\mathrm{X}}_{\mathrm{n},{\mathrm{h}}^{x}+\mathrm{i}}$|) ≤ $\mathrm{c}3(2/\mathrm{\alpha}{)}^{\mathrm{i}}$

(b) for all positive i, E(|${\mathrm{X}}_{\mathrm{n},{\mathrm{h}}^{x}+\mathrm{i}}$|) ≤ $\mathrm{c}3{2}^{-\mathrm{i}}$.

Therefore, (c) for n sufficiently large and h = ${{\mathrm{h}}^{x}}_{\mathrm{n}}$ + $\mathrm{o}\left(\sqrt{\mathrm{ln}\mathrm{n}}\right)$, P(${{\mathrm{n}}^{\text{'}}}_{\mathrm{x}}$ < 2|x ∈ ${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$ ≥ 1/2

**Remarks:**

- We assume that n is arbitrarily large by increasing constants so we don't state it explicitly.
- Note ${\mathrm{h}}_{n}^{x}$=${\mathrm{h}}_{n}^{\text{'}}$+${\mathrm{log}}_{\alpha /2}$ln n + O(1) and since ${\mathrm{h}}_{n}^{x}$ is αln(n)(1 + o(1), it is equivalent to ${\mathrm{h}}_{n}^{x}$=${\mathrm{h}}_{n}^{\text{'}}$+${\mathrm{log}}_{\alpha /2}$${\mathrm{h}}_{n}^{x}$+O(1).

and by the above lemma 1(a) for two constants d1 and d2, E($\mathrm{X}\mathrm{n},{\mathrm{h}}_{n}^{\text{'}}$) is between ${d}_{1}{\mathrm{h}}^{x}$ and ${d}_{2}{\mathrm{h}}^{x}$

In proving this lemma we note that the density function for a gamma distribution with parameter h is

${\mathrm{x}}^{h-1}\frac{\mathrm{exp}(-\mathrm{x})}{(h-1)!}$.

To obtain P(y ∈ ${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$ ), we integrate this function from 0 to ln n.

To obtain P(${{\mathrm{n}}^{\text{'}}}_{\mathrm{y}}$≥ q2) we integrate this function from 0 to ln n - ln 2.

To obtain P(n0y∏q2) we integrate it from 0 to ln n - ln 2.

When x is below ln n and h is near ${\mathrm{h}}_{n}^{x}$, decreasing x by t decreases d(x) by a multiplicative factor of at least (1+o(1))exp(t(α - 1)).

In particular, d(x - ln2) < $\frac{1}{4}\mathrm{d}\left(\mathrm{x}\right)$ whereby proof for c comes easily.

This further shows that the main contribution to the integral for P(y ∈ ${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$) comes from x near ln n and suggests to compute the ratio of the integral for h to h-1, we only consider the ratio between the corresponding density functions at this point.

The ratio is ln(n)/h which is about ${\mathrm{\alpha}}^{-1}$ and since the tree levels differ by a factor of 2 (a) seems believable.

For each h near ${\mathrm{h}}^{x}$, |${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$| is not concentrated around its expected value and more strongly Var(${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$) is high as we shall see(expected value of ${\mathrm{H}}_{n}^{\text{'}}$ is near ${\mathrm{h}}_{n}^{\text{'}}$ rather than ${\mathrm{h}}_{n}^{x}$).

To estimate Var(|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$|) we need to determine for a typical x in ${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$, whether the pivots along Px which are biased towards 1 occur near the beginning or near the end of Px.

using the fact E(|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$| : x ∈ ${\mathrm{T}}_{n}^{\text{'}}$ and labels on Px) = 1 + $\sum _{i=0}^{\mathrm{h}-1}=\mathrm{E}\left(\right|{\mathrm{X}}_{\mathrm{nf}\left({\mathrm{z}}_{\mathrm{t}}^{\text{'}}\right)},\mathrm{h}-\mathrm{i}\left|\right)$

This estimation is determined by

- How many ${\mathrm{z}}_{\mathrm{i}}$ on ${\mathrm{P}}_{\mathrm{x}}$ satisfy; f(${\mathrm{z}}_{\mathrm{i}}$) is larger than it would be if all the labels on ${\mathrm{P}}_{\mathrm{x}}$ were identical
- By how much does f(${\mathrm{z}}_{\mathrm{i}}$) exceed this "normal" value for such ${\mathrm{z}}_{\mathrm{i}}$.

These answers are key to the proof for theorem 2.

### Approach.

Consider a fixed node x at depth h in ${\mathrm{T}}_{\infty}$ and generate labels on Px by

- Choose a set S of h random reals uniformly and independently from [0...1].
- Uniformly choose a random permutation l1...lh of S and letting li be the label on the ith edge of Px

**Note:**

- The event x ∈ ${\mathrm{T}}_{n}^{\text{'}}$ is determined by 1
- The difference between f(zi) and its normal value is determined by 2

This independence allows for straightforward analysis to obtains results of the random permutation generated in 2.

for each i between l and h, let ${\mathrm{a}}_{t}^{\text{'}}$ be the natural logarithm of 1/li.

Note that $\mathrm{x}\in {\mathrm{T}}_{n}^{\text{'}}\iff \sum _{i=1}^{\mathrm{h}}{{\mathrm{a}}^{\text{'}}}_{\mathrm{i}}\le ln\mathrm{n}$

Normalize the above by setting ${\mathrm{a}}_{\mathrm{i}}$ = ${{\mathrm{a}}^{\text{'}}}_{\mathrm{i}}$ - ($\sum _{i=1}^{\mathrm{h}}{{\mathrm{a}}^{\text{'}}}_{\mathrm{i}}$)/h so $\sum _{i=1}^{\mathrm{h}}{\mathrm{a}}_{i}$ = 0.

Let ${\mathrm{PS}}_{\mathrm{i}}$ be the patial sum $\sum _{\mathrm{j}=1}^{\mathrm{i}}{\mathrm{a}}_{j}$, then f(${\mathrm{z}}_{i}$) is above its normal value precisely if ${\mathrm{PS}}_{\mathrm{i}}$ < 0.

We say x is good if x ∈ ${\mathrm{T}}_{\mathrm{n}}^{\text{'}}$ and every ${\mathrm{PS}}_{\mathrm{i}}$ ≥ 0

The key to this proof is the following lemma

### Lemma 2.

Let X = {x1,...,xh} be a set of real numbers whose sum is non-negative.

Let ${\mathrm{x}}_{\mathrm{\pi (1)}}$,...,${\mathrm{x}}_{\mathrm{\pi (h)}}$be a uniformly random permutation of X ,

then

P(∄j such that ${\sum}_{\mathrm{i}=1}^{\mathrm{j}}{\mathrm{x}}_{\mathrm{\pi}\left(\mathrm{i}\right)}<0$) ≥ $\frac{1}{\mathrm{h}}$.

Furthermore, if X sums to zero, but none of its proper subsets do, then this probability is exactly 1/h.

Let ${\mathrm{X}}_{\mathrm{n},\mathrm{h}}^{\text{'}}$ be the set of good nodes at depth h.

Summing up the bound of lemma 2 over all choices of S in 1 for which x ∈ ${\mathrm{T}}_{n}^{\text{'}}$, we get

E(|${\mathrm{X}}_{\mathrm{n},\mathrm{h}}^{\text{'}}$|) ≥ $\frac{\mathrm{E}\left({\mathrm{X}}_{\mathrm{n},\mathrm{h}}^{\text{'}}\right)}{\mathrm{h}}$ **(1)**

Steps *2* through *5* covered in the paper goes to show that E(${\mathrm{H}}_{n}^{\text{'}}$) lies between ${\mathrm{h}}^{\mathrm{x}}$ + O(1) and ${\mathrm{h}}^{-}$ + O(1).

To prove lemma 2 we tighten upper an lower bounds. For the upper bound replace bound

P(${\mathrm{X}}_{\mathrm{x},{\mathrm{h}}^{\text{'}}+\mathrm{i}}$ ≠ ∅) ≤ E(|${\mathrm{X}}_{\mathrm{x},{\mathrm{h}}^{\text{'}}+\mathrm{i}}$|) = O(h${2}^{-\mathrm{i}}$) from lemma 1 by bound

P(${\mathrm{X}}_{\mathrm{x},{\mathrm{h}}^{\text{'}}+\mathrm{i}}$ ≠ ∅) = O(${2}^{\frac{-\mathrm{i}}{2}}$) **(6)**

For the lower bound, we show that knowing x as a typically good node increases the expected number of good nodes by a constant rather than O(h).

**Intuition why (6) holds.**

Consider pairing each permutation 5 of a given S with its revers 5r.

We claim either,

- f(${\mathrm{z}}_{\mathrm{i}}$) is at least its "normal" value when the labels on Px are given by Π or
- f(${\mathrm{z}}_{\mathrm{h}-\mathrm{i}}$) is atleast its "normal" value when the labels on ${\mathrm{P}}_{\mathrm{x}}$ are given by ${\mathrm{\Pi}}_{\mathrm{r}}$.

Note that f(zi) is above its "normal" value iff

$\prod _{\mathrm{j}=1}^{\mathrm{i}}{\mathrm{l}}_{\mathrm{j}}$ ≥ ${\left(\prod _{\mathrm{j}=1}^{\mathrm{h}}{\mathrm{l}}_{\mathrm{j}}\right)}^{\frac{i}{h}}$

This claim implies that on average at least h/2 of the f(zi) are above their normal value.

To tighten the lower bound we show that for a typical good node, most of f(zi) are well below their normal value and hence the expected number of good nodes at depth ${\mathrm{h}}^{\text{'}}$ is O(1) if a good node exists.

We show that for a good node, the majority of the partial sums for the random permutation of S are above zero.

* Remark:* for S chosen in (i) there is some proper subset of S which sums to 0 has a probability of zero.

Therefore we apply a stronger version of lemma 2 which holds for sets none of whose proper subsets sum to zero to obtains

the probability x is good given that it is in ${\mathrm{X}}_{\mathrm{n},\mathrm{h}}$ is exactly 1/h **(7)**

It can be shown for general sets S which sum to 0, the probability that every partial sum ${\mathrm{PS}}_{\mathrm{i}}$ of a random permutation of S is non-negative is O(1/h) and more strongly that for a smaller *a* the probability every ${\mathrm{PS}}_{\mathrm{i}}$ exceeds -a is near 1/h.

What is the probability of the event E that ${\mathrm{PS}}_{\mathrm{i}}$ is 0 ?

if ${\mathrm{PS}}_{\frac{\mathrm{h}}{2}}$ = 0, then the permutation $\mathrm{\Pi}$ breaks S into two subsets S1 and S2 both having $\frac{\mathrm{h}}{2}$ elements and sum to 0.

Further, all partial sums for Π are non-negative so are partial sums for the permutation ${\mathrm{\Pi}}_{1}$ of ${\mathrm{S}}_{1}$ consisting of the first half if Π and permutation ${\mathrm{\Pi}}_{2}$ of ${\mathrm{S}}_{2}$ consisting of the second half of Π.

We can first generate a choice for (${\mathrm{S}}_{1}$, ${\mathrm{S}}_{2}$) and then generate ${\mathrm{\Pi}}_{1}$ and ${\mathrm{\Pi}}_{2}$ independently.

If S is well-behaved we can then show that the probability a random permutation of each Si stays above 0 is also O($\frac{1}{\mathrm{h}}$).

Thus the probability of E is O($(\frac{\mathrm{h}}{2}{)}^{2}$).

By lemma 2 the probability that all PSi for Π are non-negative is 1/h. Therefore the probability ${\mathrm{PS}}_{\frac{\mathrm{h}}{2}}$ is 0 given all ${\mathrm{PS}}_{\mathrm{i}}$ are non-negative is O($\frac{1}{\mathrm{h}}$).

### Independent properties of conditioned exponentials

Results proved about random permutations hold for a large class of distributions.

We present properties and the resultant proof simplification that relies on the independent properties of the distribution that arise in the study of random binary search trees.

Let ${\mathrm{E}}_{1}$,...,${\mathrm{E}}_{\mathrm{n}}$ be independently chosen exponential random variables with mean of 1.

Let ${\mathrm{\sigma}}_{\mathrm{n}}$ be their sum.

Let ${\mathrm{D}}_{\mathrm{n}}$ = (${\mathrm{d}}_{\mathrm{1}}$,...,${\mathrm{d}}_{\mathrm{n}}$) where ${\mathrm{d}}_{\mathrm{i}}$ = $\frac{{\mathrm{E}}_{\mathrm{i}}}{{\mathrm{\sigma}}_{\mathrm{n}}}$.

For 1 ≤ i ≤ n

Let ${\mathrm{T}}_{\mathrm{i}}$ = ${\sum}_{\mathrm{j}=1}^{\mathrm{i}}$ ${\mathrm{d}}_{\mathrm{j}}$.

* Property 1:* ${\mathrm{\sigma}}_{\mathrm{n}}$ is independent of ${\mathrm{D}}_{\mathrm{n}}$

* Property 2:* ${\mathrm{D}}_{\mathrm{n}}$ is distributed like the spacings on [0...1] determined by n-1 independent uniform variables therefore every permutation of a given multi-set is equally likely to be ${\mathrm{D}}_{\mathrm{n}}$.

* Property 3:* For i ≤ i < j ≤ n, if we condition ${\mathrm{T}}_{\mathrm{i}}$ = a and ${\mathrm{T}}_{\mathrm{j}}$ = b, then the set (${\mathrm{d}}_{\mathrm{i}+1}$ / (b - a),...,${\mathrm{d}}_{\mathrm{j}}$/(b - a)) is distributed like ${\mathrm{D}}_{\mathrm{j}-\mathrm{i}}$

* Property 4:* For any integer between 1 and n and real between 0 and 1, the probability that ${\mathrm{T}}_{\mathrm{a}}$ exceeds b is the probability that if we pick a subset B of a set A of n-1 elements by putting each element of A in B independently with probability b, we obtain |B| < a.

* Property 5:* The probability that there exists an i such that ${\mathrm{E}}_{\mathrm{i}}$ > 6ln n is less than $\frac{1}{{\mathrm{n}}^{5}}$.

#### Key ideas.

Let ${\mathrm{y}}_{\mathrm{i}}$ = ${\mathrm{d}}_{\mathrm{i}}$ - $\frac{1}{\mathrm{n}}$, to obtain ${\mathrm{y}}_{\mathrm{i}}$ we normalize ${\mathrm{d}}_{\mathrm{i}}$ such that the sum is 0.

Let ${\mathrm{U}}_{\mathrm{i}}$ = ${\sum}_{\mathrm{j}=1}^{\mathrm{i}}$ ${\mathrm{y}}_{\mathrm{j}}$ = ${\mathrm{T}}_{\mathrm{i}}$ - $\frac{\mathrm{i}}{\mathrm{n}}$.

${\mathrm{D}}_{\mathrm{n}}$ *exc* 0 , that is ${\mathrm{D}}_{\mathrm{n}}$ exceeds 0 if for all i we have ${\mathrm{U}}_{\mathrm{i}}$ ≥ 0.

Hence, for a non-negative integer a we say that ${\mathrm{D}}_{\mathrm{n}}$ exceeds -a, ${\mathrm{D}}_{\mathrm{i}}$ *exc* -a if for all i we have ${\mathrm{U}}_{\mathrm{i}}$ > $\frac{-a}{n}$.

By lemma 2 P(${\mathrm{D}}_{\mathrm{i}}$ *exc* 0) = $\frac{\mathrm{1}}{\mathrm{n}}$

### Lemma 3:

For any positive integer a P(${\mathrm{D}}_{\mathrm{n}}$ exc -a) = O($\frac{{a}^{4}}{n}$)

For the proof we can assume that a is at least 3, and the lemma holds for 3, it will also hold for 1 and 2.

We assume that n is large and a < ${\mathrm{n}}^{\frac{1}{4}}$.

Consider ${\mathrm{D}}_{\mathrm{n}+2{\mathrm{a}}^{2}}$.

Let *Ev(bc)* be the event that ${\mathrm{U}}_{{\mathrm{a}}^{2}}$ = $\frac{\mathrm{b}}{\mathrm{n}+2{\mathrm{a}}^{2}}$ and ${\mathrm{U}}_{\mathrm{n}+{\mathrm{a}}^{2}}$ = $\frac{\mathrm{c}}{\mathrm{n}+2{\mathrm{a}}^{2}}$

then if b, c > 0, given Ev(bc) we have,

$\mathrm{P}\left(\right({\mathrm{U}}_{\mathrm{i}}>0\forall \mathrm{i}\le {\mathrm{a}}^{2})\cap ({\mathrm{U}}_{\mathrm{i}}>0\forall \mathrm{i}\ge \mathrm{n}+{\mathrm{a}}^{2}\left)\right)\ge {\mathrm{a}}^{-4}$ **(8)**

Since the probability is independent of the choices for permutations of ${\mathrm{y}}_{\mathrm{i}}$ with i ≤ ${\mathrm{a}}^{2}$ or i ≥ n + ${\mathrm{a}}^{2}$ we have

For n > b > c > a: P(${\mathrm{D}}_{\mathrm{n}+2{\mathrm{a}}^{2}}$ *exc* 0 | Ev(b,c)) ≥ $\frac{1}{{\mathrm{a}}^{4}}$P(${\mathrm{D}}_{\mathrm{n}}$ exc -a) **(9)**

We show that the probability that Ev(b,c) holds for some choice of b and c with n > b > c > a is Ω(1).

Since P(${\mathrm{D}}_{\mathrm{n}+2{\mathrm{a}}^{2}}$ *exc* 0) = $\frac{1}{\mathrm{n}+2{\mathrm{a}}^{2}}$.

The proof of this lemma shows that,

P((2a < ${\mathrm{U}}_{{\mathrm{a}}^{2}}$ < 3a)∩(a < ${\mathrm{U}}_{\mathrm{n}+{\mathrm{a}}^{2}}$ < 2a)) = Ω(1). **(10)**

Using lemma 3 we can show that if ${\mathrm{D}}_{\mathrm{n}}$ exceeds 0 then it is above 0 most of the time.

### Lemma 4:

For every integer a > 0, letting ${\mathrm{R}}_{\mathrm{a}}$ be the event.

P((${\mathrm{D}}_{\mathrm{n}}$ *exc* - a) ∩ ${\mathrm{R}}_{\mathrm{a}}$) = O($\frac{1}{{\mathrm{na}}^{2}}$)

**Proof of theorem 2.**

We show that ${\mathrm{H}}_{\mathrm{n}}^{\text{'}}$ is concentrated around ${\mathrm{h}}_{\mathrm{n}}^{\text{'}}$ = ceil(αln n βln ln n) using the 3 lemmas below.

* Lemma 5*: P(${\mathrm{X}}_{\mathrm{n},{\mathrm{h}}_{\mathrm{n}}^{\text{'}}}$ ≠ ∅) = Ω(1)

* Lemma 6:* There exists (universal) constants C1 > 2 and d > 1 such that for i > 0 with i = o($\sqrt{\mathrm{ln}\mathrm{n}}$) we have;

P(${\mathrm{X}}_{\mathrm{n},{\mathrm{h}}_{\mathrm{n}}^{\text{'}}-i}$ = ∅) < C1${\mathrm{d}}^{-\mathrm{i}}$.

* Lemma 7:* There is a (universal) constant c2 > 2 such that for i > 0 we have.

P(${\mathrm{X}}_{\mathrm{n},{\mathrm{h}}_{\mathrm{n}}^{\text{'}}-i}$ ≠ ∅) < ${\mathrm{C}}_{2}$${2}^{-\mathrm{i}/2}$

Lemma 6 implies that for some constant C,

E(${\mathrm{H}}_{\mathrm{n}}^{\text{'}}$) > ${\mathrm{h}}_{\mathrm{n}}^{\text{'}}$ - C${\mathrm{log}}_{\mathrm{d}}$C1 + o(1).

similarly lemma 7 implies that for some constant C

E(${\mathrm{H}}_{\mathrm{n}}^{\text{'}}$) > ${\mathrm{h}}_{\mathrm{n}}^{\text{'}}$ - C${\mathrm{log}}_{\mathrm{d}}$C2 + o(1)

Therefore E(${\mathrm{H}}_{\mathrm{n}}^{\text{'}}$) = ${\mathrm{h}}_{\mathrm{n}}^{\text{'}}$ + O(1)

Combining lemmas 6,7 yields Var(${\mathrm{H}}_{\mathrm{n}}^{\text{'}}$) = O(1), since ${\sum}_{{\mathrm{b}}^{\mathrm{i}}}^{{\mathrm{i}}^{2}}$ is O(1) for any b > 1.

As discussed we apply the proofs of the 3 lemmas above to the labels on the path from the root to a node x of the tree, this is because we can first expose the products of the labels to determine if x is in ${\mathrm{T}}_{\mathrm{n}}^{\text{'}}$ and then expose the actual values of the labels to check if x is good or determine other properties of the ordered set of labels on this path.

Since all 3 results are asymptotic we assume n is large.

We consider some node x at depth h of ${\mathrm{T}}_{\infty}$, the path ${\mathrm{P}}_{\mathrm{x}}$=${\mathrm{z}}_{\mathrm{o}}$,...${\mathrm{z}}_{\mathrm{h}}$ = x from the root of ${\mathrm{T}}_{\infty}$ to x, and the nodes ${\mathrm{z}}_{0}^{\text{'}}$,..,${\mathrm{z}}_{\mathrm{h}-1}^{\text{'}}$.

Let ${\mathrm{a}}_{\mathrm{i}}^{\text{'}}$ be the negative of the natural logarithm if the label on the edge into ${\mathrm{z}}_{\mathrm{i}}$.

Let ${\mathrm{a}}_{\mathrm{i}}$ be ${\mathrm{a}}_{\mathrm{i}}^{\text{'}}$.

Let Π (= Π(x)) be the permutation ${\mathrm{a}}_{\mathrm{i}}$ given by the labeling of ${\mathrm{P}}_{\mathrm{x}}$.

Let σh = ${\sum}_{\mathrm{j}=1}^{\mathrm{h}}$${\mathrm{a}}_{\mathrm{j}}$

Let ${\mathrm{PS}}_{\mathrm{i}}$=${\sum}_{\mathrm{j}=1}^{\mathrm{h}}$${\mathrm{a}}_{\mathrm{j}}$

Therefore for a positive real a we say Π is above -a(written Π > -a), if each ${\mathrm{PS}}_{\mathrm{i}}$ is.

We say Π is above o (Π > 0) if for all i ${\mathrm{PS}}_{\mathrm{i}}$ ≥ 0.

We say x is good if Π is above 0;

We say x is above -a if Π is.

To expose ${\mathrm{a}}_{\mathrm{i}}^{\text{'}}$

- Choose σh according to the appropriate probability distribution.
- Choose ${\mathrm{D}}_{\mathrm{h}}$ as in property 2.
- Set ${\mathrm{a}}_{\mathrm{i}}^{\text{'}}$ = ${\mathrm{d}}_{\mathrm{i}}{\mathrm{\sigma}}_{h}$.

Note: ${\mathrm{PS}}_{\mathrm{i}}$ = ${\mathrm{\sigma}}_{\mathrm{h}}{U}_{\mathrm{i}}$.

Therefore P(Π > -a) is simply

P(${\mathrm{D}}_{\mathrm{h}}$ exc $\frac{-ah}{{\mathrm{\sigma}}_{h}}$) and P(x is good) is P(${\mathrm{D}}_{\mathrm{h}}$ exc 0).

Consider h ≃ αln n and ${\mathrm{\sigma}}_{h}$ ≃ αln n.

Therefore h/${\mathrm{\sigma}}_{h}$ ≃ α.

In particular we consider choices for h and ${\mathrm{\sigma}}_{h}$ such that h < 5${\mathrm{\sigma}}_{h}$

Lemma 3 applies in such a case

Conditioned on a choice of ${\mathrm{\sigma}}_{h}$ less than 5h.

P(Π > -a) = P(${\mathrm{D}}_{\mathrm{h}}$ exc $\frac{-ah}{{\mathrm{\sigma}}_{h}}$) < P(${\mathrm{D}}_{\mathrm{h}}$ exc - 5a) = O($\frac{{\mathrm{a}}^{4}}{\mathrm{h}}$) **14**

### A stronger result.

Since ${\mathrm{T}}_{\mathrm{n}}^{\text{'}}$ is at least as tall as ${\mathrm{T}}_{\mathrm{n}}$ we have an immediate corollary of lemma 7.

### Lemma 8:

There is (universal) constant C2 such that for i > 0 with i = o(logn) we have

P(${\mathrm{H}}_{\mathrm{n}}$ ≥ ${\mathrm{h}}_{\mathrm{n}}^{\text{'}}$ + i) < ${\mathrm{C}}_{\mathrm{2}}$${2}^{-\mathrm{i}/2}$

### Lemma 9:

There is a (universal) constant ${\mathrm{C}}_{\mathrm{3}}$ such that the probability the ${\mathrm{T}}_{\mathrm{n}}$ contains a node at height ${\mathrm{h}}_{\mathrm{n}}^{\text{'}}$ - ${\mathrm{C}}_{\mathrm{3}}$ is Ω(1).

### Lemma 10:

There exists (universal) constants ${\mathrm{C}}_{\mathrm{1}}$, ${\mathrm{C}}_{\mathrm{3}}$ and d < 1 such that for i > 0 with i = o($\sqrt{\mathrm{log}n}$) we have

P(${\mathrm{H}}_{\mathrm{n}}$ < ${\mathrm{h}}_{\mathrm{n}}^{\text{'}}$ - ${\mathrm{C}}_{3}$ - i) < ${\mathrm{C}}_{1}$${\mathrm{d}}^{\mathrm{i}}$

this lemma comes about by mimicking proofs of lemma 6 using lemma 9 in the proof of lemma 5.

* Theorem 1* follows from these results same as theorem 2 from lemmas 5, 6, 7.

### References

The Height of a Random Binary Search Tree by Bruce Reed from McGill University, Montreal Quebec, Canada and CNRS, Paris, France.