Average length of the longest segment

Question

This post is related to a previous SE post If a 1 meter rope …. concerning average length of a smallest segment.

A rope of 1m is divided into three pieces by two random points. Find the average length of the largest segment. My answer is 11/18. Here is how I do it:

Here we have two independent random variables $X,Y$, both uniform on $[0,1]$. Let $A=\min (X,Y), B=\max (X,Y)$ and $C=\max (A, 1-B, B-A)$. First we want to find the probability density function $f_C(a)$ of $C$. Let $F_C(a)$ be the cumulative distribution function. Then $$ F_C(a) = P(C\le a)=P(A\le a, 1-B\le a, B-A\le a).$$ By rewriting this probability as area in the unit square, I get $$F_C(a)=\left\{\begin{array}{ll} (3a-1)^2 & \frac{1}{3}\le a\le \frac{1}{2}\\ 1-3(1-a)^2 & \frac{1}{2}\le a\le 1\end{array}\right.$$ from which it follows that $$f_C(a)=\left\{\begin{array}{ll} 6(3a-1) & \frac{1}{3}\le a\le \frac{1}{2}\\ 6(1-a) & \frac{1}{2}\le a\le 1\end{array}\right.$$ Therefore the expected value of $C$ is $$\int_{1/3} ^{1/2}6a(3a-1) da+\int_{1/2} ^{1}6a(1-a) da= \frac{11}{18}.$$

My questions are:

(A) Is there a "clever" way to figure out this number 11/18?

(B) What is the answer if the rope is divided into $n>3$ segments?

There is a simple geometric answer: It is the first corrdinate of mass center of these points: (1,0,..) (1/2,1/2,0,..) .. (1/n,1/n,1/n,...1/n) =(1+1/2+...1/n)/n — Yudong Tang, Mar 23 '14 at 20:59
@TCL how do you go from the first expression for $F_C(a)$ to the unit square expression? — user1936752, Mar 22 '23 at 02:51

score 22 · Accepted Answer · edited Apr 13 '17 at 12:19

The answer to (B) is actually given in both Yuval Filmus' and my answers to the question about the average length of the shortest segment. It's $$\frac{1}{n} H_n,$$ where $H_n = \sum_{k=1}^n \frac{1}{k},$ i.e., the $n$th harmonic number.

"Clever" is of course subjective, but here's an argument for (A) in the $n$-piece case. At least there's only one (single-variable) integration in it. :)

If $X_1, X_2, \ldots, X_{n-1}$ denote the positions on the rope where the cuts are made, let $V_i = X_i - X_{i-1}$, where $X_0 = 0$ and $X_n = 1$. So the $V_i$'s are the lengths of the pieces of rope.

The key idea is that the probability that any particular $k$ of the $V_i$'s simultaneously have lengths longer than $c_1, c_2, \ldots, c_k$, respectively (where $\sum_{i=1}^k c_i \leq 1$), is $$(1-c_1-c_2-\ldots-c_k)^{n-1}.$$ This is proved formally in David and Nagaraja's Order Statistics, p. 135. Intuitively, the idea is that in order to have pieces of size at least $c_1, c_2, \ldots, c_k$, all $n-1$ of the cuts have to occur in intervals of the rope of total length $1 - c_1 - c_2 - \ldots - c_k$. For example, $P(V_1 > c_1)$ is the probability that all $n-1$ cuts occur in the interval $(c_1, 1]$, which, since the cuts are randomly distributed in $[0,1]$, is $(1-c_1)^{n-1}$.

If $V_{(n)}$ denotes the largest piece of rope, then $$P(V_{(n)} > x) = P(V_1 > x \text{ or } V_2 > x \text{ or } \cdots \text{ or } V_n > x).$$ This calls for the principle of inclusion/exclusion. Thus we have, using the "key idea" above, $$P(V_{(n)} > x) = n(1-x)^{n-1} - \binom{n}{2} (1 - 2x)^{n-1} + \cdots + (-1)^{k-1} \binom{n}{k} (1 - kx)^{n-1} + \cdots,$$ where the sum continues until $kx > 1$.

Therefore, $$E[V_{(n)}] = \int_0^{\infty} P(V_{(n)} > x) dx = \sum_{k=1}^n \binom{n}{k} (-1)^{k-1} \int_0^{1/k} (1 - kx)^{n-1} dx = \sum_{k=1}^n \binom{n}{k} (-1)^{k-1} \frac{1}{nk} $$ $$= \frac{1}{n} \sum_{k=1}^n \frac{\binom{n}{k}}{k} (-1)^{k-1} = \frac{H_n}{n},$$ where the last step applies a known binomial sum identity.

This is a long shot, but the harmonic number $H_n$ also appears in the classical coupon collector problem. Is there some deep combinatorial connection between the broken stick and the coupon collector problems? — Jack D'Aurizio, Mar 26 '17 at 02:23
Probably dumb questions: 1) why is E[V] given by that integral? Would have thought it should be an integral of x times some PDF. 2) Is it unusual that the $P(V_{(n)})$ density is a sum of densities defined on intervals of different lengths? Very strange looking, but stats isn't my field. — user26866, Oct 22 '20 at 03:17
Figured it out, $P(V_{(n)})$ isn't a CDF, not a PDF, and the number of terms in the sum depends on $x$. Get the PDF by taking $\partial_x$, expectation value by $E[V_{(n)}]=\int {\rm d}x, x \partial_x P(V_{(n)})$, integrate-by-parts, and you're left w/ the result after being careful about what terms int he sum contribute at each value of $x$. — user26866, Oct 22 '20 at 11:16

Ronnie268 · Answer 2 · 2018-01-09T19:11:38.580

Neat as it is, I don't think Rahul's answer can be correct. If we have $3x+2y+z=1$ and $3x+2y+z=9n$, then $n=1/9$, which means $x \leq 2/9$, which can't be right, as one solution is all pieces being of length 1/3 (however unlikely this exact solution may be, $x$ can take values in $(2/9,1/3]$).

Stefan's answer is wrong because the probability of any given point on the stick being in the longest piece (when cut in 2) is not uniform. That can be seen by considering the point halfway along, which is always in the longest piece. You can find the probability density function for the likelihood of any given point being in the longer half, then integrate to get the answer below.

My preferred solution is to let the cuts be at $X, Y$, with $Y \gt X$:

Image of cut positions

Then each piece is equally likely to be the longest, and the expected length of the longest piece doesn't depend on which piece we choose. Then we can calculate $\mathop{\mathbb{E}}(X|X \text{ is the longest piece} )$.

We have the three inequalities: $$X \gt Y-X \implies Y < 2X$$ $$X \gt 1-Y \implies Y > 1-X$$ and, from our setup, $$Y \gt X$$ These can be represented by the following diagram:

Diagram of inequalities

Then the area satisfying our inequalities is the two triangles A and B. So we wish to find the expected value of $X$ within this area.

The expected value of $X$ in A is $\bar{X}_A = \frac{1}{2}-\frac{1}{3}(\frac{1}{2}-\frac{1}{3}) = \frac{8}{18}$.

The expected value of $X$ in B is $\bar{X}_B = \frac{1}{2}+\frac{1}{3}(\frac{1}{2}) = \frac{4}{6} = \frac{12}{18}$

The area of A is $A_A = \frac{1}{2} \times \frac{1}{2}\times (\frac{1}{2}-\frac{1}{3}) = \frac{1}{24}$.

The area of B is $A_B = \frac{1}{2} \times \frac{1}{2}\times \frac{1}{2} = \frac{1}{8} = \frac{3}{24} = 3 A_A$.

So $\mathop{\mathbb{E}}(X|X \text{ is the longest piece} ) = \frac{\tfrac{8}{18} + 3\left(\tfrac{12}{18}\right)}{4} = \frac{11}{18}$

(Apologies for the comments on other solutions and the links, but I haven't got a high enough reputation to comment or embed directly)

Could you explain how you get to the expected value of X in A and B and more specifically how you get to the 1/3 in front of the parentheses? I can derive it by calculating it via integration over the area, but you seem to use a shortcut. — Hans-Peter Schrei, Apr 16 '19 at 20:47
I used the fact that, for a triangle, the centre of area is 1/3 of the way from the base to the opposite point. I guess that's most easily seen by integration. — Ronnie268, Apr 18 '19 at 12:10
It's the intersection of $Y=2X$ and $Y = 1 - X$, so $2X = 1 - X$, whence $X = 1/3$ — Ronnie268, Nov 12 '21 at 16:25
Apologies for the correction but Ronnie268, your answer is incorrect: (1) Small calculation mistake in the last step, denominator should be 24: (| is the longest piece) $= \frac{ \frac{8}{18}+3(\frac{12}{18})} {24 (not 4)}$ $ = \frac{1}{6}(\frac{11}{18})$ (2) I think the rest of the expected value is missed because you didnt calculate ( Y-X |Y-X is the longest piece) and ( 1-Y | 1-Y is the longest piece ). — Ashwin Samuel, Jun 30 '23 at 01:25
@Ashwin No, Ronnie268 is correct. The denominator is 4 because it's the sum weighting. Area of A is $A_A$ while area of B is $3A_A$, so their total area is $4A_A$. — regulus, Dec 07 '23 at 18:30

Rahul Madhavan · Answer 3 · 2018-07-22T15:21:26.163

9

A more intuitive answer. Let's arrange the segments from smallest to largest. Let the three segments be $x$, $x+y$ and $x+y+z$.

${\rm sum\ of\ segments}$ = 1 $\implies 3x + 2y + z = 1$;

subject to the constraints: $x\geqslant 0,y \geqslant 0,z \geqslant 0$ $\implies x \leqslant 1/3 , y \leqslant 1/2, z \leqslant 1$.

Within these limits, we can assume $x$,$y$ and $z$ can take on any values as long as we renormalize the total length to be equal to 1, sort of building the larger string from its constituent parts. These relative limits are equivalent to the following:

Let $x$ be chosen randomly among a pool from $[0,2n]$, $y$ be chosen randomly from a pool $[0,3n]$ and $z$ from a pool $[0,6n]$ where $(x,y,z,n)$ $\in$ $\mathbb R$.

Expected value of $x = n$, expected value of $y = 1.5n$ and expected value of $z = 3n$.
Expected length of longest segment = $x+y+z = 5.5n$
Expected total length = $3x+2y+z = 9n$

Therefore the expected length of longest segment is

$(x+y+z)/(3x+2y+z)= 5.5/9= 11/18 $.

The expected length of the shortest segment is

$x/(3x+2y+z) = 1/9$.

edited Jul 22 '18 at 15:21

answered Apr 22 '17 at 15:47

Rahul Madhavan

2,789
1
11
14

How do you know that the expected length of the longest segment is x+y+z? – Ferdinando Randisi Feb 25 '18 at 21:13
That is part of the formulation. The three segments are taken to be x, x+y and x + y + z. Given the expected values of x, y and z, the expected value of the longest segment, x + y + z, is 5.5n – Rahul Madhavan Mar 05 '18 at 05:00
2

I am wondering why is 3E(x)+2E(y)+E(z) != 6n (total length), can you please help me with an intuitive explanation? – Jaydev Aug 02 '18 at 20:33
@Jaydev the total length is 9n, not 6n – user1936752 Mar 26 '23 at 03:01

score 2 · Answer 4 · answered Jul 29 '16 at 23:37

Why is this answer wrong?

Let the length of the longest segment created by the FIRST cut be $L_1$. Now consider the second cut. With probability $(1-L_1)$ it will be in the shorter segment made by the first cut, and the longest of the 3 segments will then have length $L_1$.

On the other hand, with probability $L_1$ the second cut will be in the longer first segment, and the longest of the 3 segments will then be the max of the longer segment made by the second cut, call it $L_2$, or the shorter segment made by the first cut, of length $(1-L_1)$.

This gives an expected length of the longest segment as: $(1-L_1)L_1+L_1 \max[L_2,(1-L_1)]$. If we now average this over cut positions, it is easy to reason that $L_1=3/4$, and $L_2=L_1^2=9/16$, giving the average length of the longest segment to be 39/64. This disagrees with 11/18, but only by 0.3%!

I assume there is a problem with performing the "averaging" over the $\max$ function, but I don't understand why. Any feedback would be great! Thanks!!

score 1 · Answer 5 · answered Aug 17 '23 at 06:03

This answer formalizes Rahul's clever answer, and shows how it works because they have currently gotten away with (wonderfully slick and probably accidental) murder. In particular, their argument about the constraints doesn't make any sense because we could have $x = 1/3, y=1/2, z=1$ such that the length would be $3$, not $1$. And why should we be able to normalize by the expected value of the length over this region?

The initial setup was clever. Effectively, we want to know what the expected value of $M = x+y+z$ is, subject to the constraint that $3x + 2y + z = 1$ and all non-negative, which is a planar triangular region in $\mathbb{R}_+^3$ (i.e. where $x,y,z\geq 0$). But to make the math even nicer, we can compute the expected value of $M$ on the planar region given by $3x + 2y + z = L$ for any $L>0$, and then just normalize by $L$. This is allowed by Rahul's renormalization explanation, or by the linearity of $M$.

By the linearity of expectation, $\mathbb{E}M = \mathbb{E}X + \mathbb{E}Y + \mathbb{E}Z$, where $X,Y,Z$ are the coordinate functions in this region. So how do we go about computing these expectations in this region? Do we need to integrate?

Well, no. This region is a triangle, so we can just calculate the centroid by averaging the vertices. Notice that the vertices are $$ \mathbf{v_1} = \begin{pmatrix} L/3 \\ 0 \\ 0 \end{pmatrix}, \mathbf v_2 = \begin{pmatrix} 0 \\ L/2 \\ 0 \end{pmatrix}, \mathbf v_3 = \begin{pmatrix} 0 \\ 0 \\ L \end{pmatrix}, $$ so the centroid is $$ \frac{\mathbf v_1 + \mathbf v_2 + \mathbf v_3}{3} = \frac{L}{18}\begin{pmatrix} 2 \\ 3 \\ 6 \end{pmatrix}. $$

Of course now we are done, because we plug in $L =1$ and evaluate $M$ to get $\frac{11}{18}$, and if we want the smallest it's just $\mathbb{E}X = \frac{2}{18} = \frac{1}{9}$.

But let's figure out why Rahul's trick worked. Observe that Rahul basically found the center of the box $[0,2n]\times[0,3n]\times[0,6n]$. They found these bounds the same way that we found the vertices of our triangle, but made the mistake of treating the box as the proper region, not the triangle. Then they got their coordinates, $(n, 1.5n, 3n)^T$, and said well, these coordinates correspond to a length of $9n$, so normalizing by this they got the result. However, notice that the vertices with at $2n, 3n, 6n$ correspond to a length of $18n$, so something fishy is going on here.

What's happened is this: the centroid of the triangle with vertices $$ \mathbf{v_1} = \begin{pmatrix} a \\ 0 \\ 0 \end{pmatrix}, \mathbf v_2 = \begin{pmatrix} 0 \\ b \\ 0 \end{pmatrix}, \mathbf v_3 = \begin{pmatrix} 0 \\ 0 \\ c \end{pmatrix}, $$ is $$ \frac{\mathbf v_1 + \mathbf v_2 + \mathbf v_3}{3} = \frac{1}{3}\begin{pmatrix} a \\ b \\ c \end{pmatrix}, $$ as we discussed above. However, the center of the box $[0,a]\times[0,b]\times[0,c]$ is $$ \frac{\mathbf v_1 + \mathbf v_2 + \mathbf v_3}{2} = \frac{1}{2}\begin{pmatrix} a \\ b \\ c \end{pmatrix}. $$ This is the centroid given by scaling the triangle by $3/2$. For our lengths problems, this means Rahul found the centroid of the triangle with $3/2$ the length of the one implied by their setup. But because they renormalized, this didn't end up mattering. However, we have to admit they got kind of lucky here.

Anyways, the cool thing about this method is that it easily generalized to $n$ cuts. Now our pieces have length $x_1, x_1 + x_2, \dots, x_1 + \dots + x_n$. This means the total length is $nx_1 + (n-1)x_2 + \dots + x_1$, so our space is the $n$-simplex in the $n-1$ dimensional hyperplane in $\mathbb{R}^n_+$ given by $$ nx_1 + (n-1)x_2 + \dots + x_1 = 1, $$ which has vertices $$ \begin{pmatrix} \frac{1}{n} \\ 0 \\ \vdots \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ \frac{1}{n-1} \\ \vdots \\ 0 \end{pmatrix}, \dots, \begin{pmatrix} 0 \\ 0 \\ \vdots \\ 1 \end{pmatrix}. $$

Thus the centroid is $$ \frac{1}{n}\begin{pmatrix} \frac{1}{n} \\ \frac{1}{n-1} \\ \vdots \\ 1 \end{pmatrix}, $$ so the expected largest piece is $$ \frac{1}{n}\left(\frac{1}{n} + \frac{1}{n-1} + \dots + 1 \right) = \frac{H_n}{n}, $$ as per Mike's answer. The expected smallest piece has size $\frac{1}{n^2}$.

Rob Abramovic · Answer 6 · 2019-04-27T23:13:41.350

I have another idea for the case where the stick is divided into $n$ pieces: let $X_{i}$ $1 \leq i \leq n$ be the cut marks, which are independently randomly distributed variables between $0$ and $1$, and set $A=max{X_{i}}$. Now, consider the two pieces, one of length $A$ and the other of length $1-A$. The piece of length $A$ is then subdivided into $n-1$ pieces while the one of length $1-A$ remains whole. If $1-A>A$, then the piece of length $1-A$ will be the longest (obviously). If $1-A<A$, then the longest piece is the maximum of the longest subdivision on $A$ and the one of length $1-A$. Let $L_{n}$ denote the length of the longest segment of a stick of unit length divided into $n$ pieces. Since the length of the longest subdivision on the stick of length $A$ is $AL_{n-1}, we obtain the following formula:

$$E(L_{n})=P(1-A>A)E(1-A)+P(1-A<A)E(\max(AL_{n-1}, 1-A))$$

Since the $X_{i}$ are independent, the CDF for $A$ is $F_{A}(y)=y^{n}$ and the PDF is $ny^{n-1}$, so that we obtain: $$E(A)= \int_{0}^{1} ny^{n} dy = \frac{ny^{n+1}}{n+1}|_{0}^{1}=\frac{n}{n+1}$$. Therefore, $P(1-A>A)=P(A<\frac{1}{2})=\frac{1}{2^{n}}$, so that $P(1-A<A)=1-\frac{1}{2^{n}}$. Now, $E(1-A)=E(1)-E(A)=1-\frac{n}{n+1}=\frac{n+1-n}{n+1}=\frac{1}{n+1}$. Therefore, the above formula for $E(L_{n})$ reduces to:

$$E(L_{n})=\frac{1}{2^{n}(n+1)}+(1-\frac{1}{2^{n}})(E(\max(AL_{n-1}, 1-A))$$

Now to deal with $E(\max(AL_{n-1}, 1-A)$, let $f_{L_{n-1}}$ denote the PDF for the random variable $L_{n-1}$. I believe we can assume that $A$ and $L_{n-1}$ are independent. If this is the case, let $M=\max(AL_{n-1},1-A)$ and note the following fact: if $a=L_{n-1}$ is fixed, then $P(M \leq y)= P(1-y \leq A \leq \frac{y}{a}) = F_{A}(\frac{y}{a})-F_{A}(1-y)=(\frac{y}{a})^{n}-(1-y)^{n}$. Therefore, we obtain the following CDF for $M$: $$F_{M}(y)=\int_{\frac{1}{n-1}}^{1} [\frac{y^{n}}{a^{n}}-(1-y)^{n}]f_{L_{n-1}}(a) da$$ Note that the lower bound $\frac{1}{n-1}$ on the integral is justified by the fact that the longest piece on the piece of a rope of unit length cut into $n-1$ pieces must be longer than $\frac{1}{n-1}$. Pulling out factors that do not depend on $a$ from the above integral yields:

$$F_{M}(y) = y^{n}\int_{\frac{1}{n-1}}^{1} \frac{f_{L_{n-1}}(a)}{a^{n}}da-(1-y)^{n}\int_{\frac{1}{n-1}}^{1}f_{L_{n-1}}(a)da$$

It is obvious from the definition of a probability density function that $\int_{\frac{1}{n-1}}^{1}f_{L_{n-1}}(a)da=1$ and we further recognize $\int_{\frac{1}{n-1}}^{1} \frac{f_{L_{n-1}}(a)}{a^{n}}da$ as $E(\frac{1}{L_{n-1}^{n}})$. We thus obtain: $$F_{M}(y)=y^{n}E(\frac{1}{(L_{n-1})^{n}})-(1-y)^{n}$$ Differentiating $F_{M}(y)$, we get the following PDF, $f_{M}(y)$ for $M$: $$f_{M}(y)=ny^{n-1}E(\frac{1}{(L_{n-1})^{n}})+n(1-y)^{n-1}$$ so that we can then calculate $E(M)$:

$$E(M)=E(\frac{1}{(L_{n-1})^{n}})\int_{\frac{1}{n}}^{1} ny^{n} dy + \int_{\frac{1}{n}}^{1} ny(1-y)^{n-1} dy$$

We compute $$\int_{\frac{1}{n}}^{1} ny^{n} dy = \frac{ny^{n+1}}{n+1}|_{\frac{1}{n}}^{1}= \frac{n}{n+1}-\frac{n(1/n)^{n+1}}{n+1}=\frac{n}{n+1}-\frac{1}{n^{n}(n+1)}$$

Finding a common denominator for the above two fractions, we get

$$\int_{\frac{1}{n}}^{1} ny^{n} dy = \frac{n^{n+1}-1}{n^{n}(n+1)}$$

From wolfram alpha's integrator, I computed

$$\int_{\frac{1}{n}}^{1} ny(1-y)^{n-1} dy = \frac{2(n-1)^{n}}{n^{n}(1+n)}$$

We can then conclude that

$$E(M)=\frac{(n^{n+1}-1)E(\frac{1}{(L_{n-1})^{n}})+2(n-1)^{n}}{n^{n}(n+1)}$$

Plugging this value back into the formula for $E(L_{n})$, we obtain:

$$E(L_{n})=\frac{1}{2^{n}(n+1)}+(1-\frac{1}{2^{n}(n+1)})\frac{(n^{n+1}-1)E(\frac{1}{(L_{n-1})^{n}})+2(n-1)^{n}}{n^{n}(n+1)}$$

Combining the two fractions using the common denominator of $(2n)^{n}(n+1)^{2}$, we obtain

$$E(L_{n})=\frac{n^{n}(n+1)+(2^{n}(n+1)-1)((n^{n+1}-1)(E(\frac{1}{(L_{n-1})^{n}})+2(n-1)^{n})}{(2n)^{n}(n+1)^{2}}$$

In order for this recursion formula to be useful, however, I need to be able to write $E(\frac{1}{(L_{n-1})^{n}}$ in terms of $E(L_{n-1})$. I have calculated that $E(\frac{1}{(L_{2})^{3}})=3$ and that $E(\frac{1}{(L_{2})^{4}})=12$, but I do not know in general how $E(L_{n-1})$ relates to $E(L_{n})$; does anyone have any ideas on how to do this? Please let me know. Thanks!

Average length of the longest segment

6 Answers6

Linked

Related