In relation to a research project I'm in, an interesting question came up:
Given a classroom of a certain size, with a certain number of students in it for a certain amount of time, what is the average distance between them.
I'm ignoring the time aspect of the problem and only focusing on the other parts.
I tried modeling the room as simply a rectangular region in a Cartesian plane with $N$ students occupying some randomly chosen group of points with integer coordinates i.e lattice points.
Each configuration of $N$ students is equally probable. No 2 students can occupy the same point.
I came up with this formula for the expected value of the Euclidean distance between any $2$ students:
$$\mathbb{E}[d(s_i, s_j)] = \dfrac{1}{2N(N-1)} \dfrac{WH-2 \choose N-2}{WH \choose N} \sum_{x_i = 0}^{W} \sum_{y_i = 0}^{H} \sum_{x_j = 0}^{W} \sum_{y_j = 0}^{H} \sqrt{(x_i - x_j)^2 + (y_i - y_j)^2}$$
Where $W$ is the unit width of the room, $H$ the unit height, and $N$ the number of students. Each student $s_i$ is defined by a unique point $(x_i, y_i)$. The term $\dfrac{WH-2 \choose N-2}{WH \choose N}$ represents the probability that a given configuration of the class has those 2 points occupied by a student and $\dfrac{1}{N(N-1)} = P(s_i)P(s_i | s_j)$ represents the probability that those 2 students are picked (the 2 is because of double counting). These are constant so I just moved them to the outside.
Similar questions have been answered before (1,2,3,4,5) for any points within a region rather than just lattice points. The closest thing to that question is (Average shortest distance between some random points in a box) but I would like to know how it scales for any general rectangular region. In addition, that answer simply simulated the points and said the problem can be approximated as the "any 2 random points in a unit square" problem, but that only works for low values of $N$ and for a unit square. It also doesn't provide a proof.
My main questions are:
- Is there a closed form expression for the sum above?
- If not, is there a way to approximate the problem above?
- Is simply simulating the above problem the best course?