6

Suppose we pick pairs of triples from $\{ 0, 1, 2, \dots, 255\}^3$ with a uniform distribution and would like to find a closed form for the distribution of the Euclidean distances

$$ d((x_1, x_2, x_3), (y_1, y_2, y_3)) = \left( \sum_{j=1}^3 (x_j-y_j)^2 \right)^{1/2}.$$

The motivation for this question comes from the cubic lattice that is one of the RGB color spaces. It would be nice to have an exact probability distribution because it would likely have interesting applications in image processing.

This question gives the answer for when a cube includes all real numbers in it, and there's Mathai et al.$^\color{magenta}{\star}$, but what about when it is discrete, like for a cubic lattice?


$\color{magenta}{\star}$ Arak M. Mathai, Peter Moschopoulos, Giorgio Pederzoli, Distance between random points in a cube [PDF], Statistica, Volume 59, Number 1, 1999.

  • @SeverinSchraven They are discrete uniform I guess. Just like the points that make up RGB. – Teg Louis Jul 08 '23 at 00:57
  • 1
    @eyeballfrog No, it is not the unity cube. I am really only interesting in when the coordinates are from nonnegative integers. I tried figuring it out. But doing it by exhaustion for RGB is has too many possible combinations to do that. And I don't know how to abstract and generalize from that paper by Mathai. I just can understand it, I can't derive anything from it. – Teg Louis Jul 08 '23 at 01:00
  • @SeverinSchraven I believe in this instance because scaling is linear, it would be okay to do what eyeballfrog suggested. I know in Mathematica, they use the unit cube to describe the colors. JavaScript doesn't. – Teg Louis Jul 08 '23 at 01:24
  • 1
    Yes, everything is the same up to some factor. The crucial part is that it's finitely many points only and that they are arrangend in a regular fashion. I guess you would be interested in the set $${0, 1, 2, \dots, 255}^3$$ right? Also, are you interested in the euclidean distance (like the first thing you've linked)? Or do you have another distance in mind, like $$d(x_1,x_2, x_3, y_1,y_2,y_3)= \vert x_1-y_1\vert +\vert x_2-y_2\vert +\vert x_3-y_3\vert$$ i.e. the length of the shortest path. – Severin Schraven Jul 08 '23 at 01:31
  • @SeverinSchraven Great question. Yes, that is the set I am interested in. For the distance, just Euclidean. – Teg Louis Jul 08 '23 at 01:46
  • It seems to be much more of a pain then what I thought. Originally I thought, we can just fix one length that arises as a distance, compute its decompositions into cubes (they can be multiple of those) and then for each configuration compute the number of pairs that realize that distance (that is not hard, just start at at the edge, see how often you can shift and don't forget reflections and permutations of those configurations). [cont] – Severin Schraven Jul 08 '23 at 05:36
  • [cont] However, already the number of ways on can write a number as sum of three cubes is not that easy to come by https://math.stackexchange.com/questions/2068160/determining-the-number-of-ways-a-number-can-be-written-as-sum-of-three-squares. Maybe you can do this numerically, but I am unable to come up with a closed form. – Severin Schraven Jul 08 '23 at 05:36
  • 1
    For the other distance that I mentioned above (the shortest path), it's substantially easier as this boils down to much easier combinatorics (essentially stars and bars to write a given number as the sum of nonnegative integers). – Severin Schraven Jul 08 '23 at 05:38
  • @SeverinSchraven I believe I had found the probabilities for when the distance is an integer. I am wondering if in this case, there would be a 1-1 function with the distance that you had mentioned. I don't know though. – Teg Louis Jul 09 '23 at 20:17
  • @TegLouis I don't quite follow. For which distance have you found the probabilitites? Maybe you can update your post and let us know what exactly you have figured out? – Severin Schraven Jul 10 '23 at 00:06
  • Have you considered to just compute the probabilities numerically? Like brute force? I am not very experienced with those kind of stuff, but $(256)^6 \approx 3\times 10^{14}$ values sounds like something that one could potentially deal with. – Severin Schraven Jul 10 '23 at 00:26
  • I am doing a hybrid. It is a brutish force method. And it is very doable for 255. I guess even though I am not a mathematician, I still have the urge to initially always try to find exact, general solutions. – Teg Louis Jul 10 '23 at 01:08

3 Answers3

2

Let $X$ be the random variable you defined. For a natural number $n$, the probability $\mathbb P[X=\sqrt n]$ is going to be some multiple of the number of ways of writing $n$ as a sum of three squares, up to boundary effects. There exist relatively simple ways of counting the number of representations of $n$ as sums of two or four squares, but which rely on the prime factorization of $n$ which already makes them complicated. For three squares, I think it's fair to say that there's no closed form. For example if $n=4^a(8b+7)$ for positive integers $a,b$ then there are no representations as sums of three squares and consequently $\mathbb P[X=\sqrt n]=0$.

These difficulties, of course only exist if you insist on computing the PDF of $X$ in closed form. If you instead ask for $\mathbb P[X\leq t]$ as a function of $t$, you can get fairly good approximations. In fact, that is a three-dimensional version of the Gauss circle problem.

Derivative
  • 1,566
1

Brute force computation works, with the following trick.

The square root expression for the distance $d$ between $(x_1, x_2, x_3), (y_1, y_2, y_3)$ is equivalent to

$$ d^2 = (x_1 - y_1)^2 + (x_2 - y_2)^2 + (x_3 - y_3)^2 $$

Each term on the RHS is a perfect square and is bounded by $256^2$. There are $256$ perfect squares less than $256^2$ (including $0$). We can trivially enumerate them. For each pair $(x,y), 0 \le x,y \le 255$ we can compute $(x-y)^2$ and build a frequency table $T$ containing entries $\langle (x-y)^2, \nu_{(x-y)^2}\rangle$ where $\nu_{(x-y)^2}$ is the frequency of $(x-y)^2$. We treat this table as a set and identify the entries in it by the pair $(s,\nu)$.

For each triple $((s_1, \nu_1), (s_2, \nu_2), (s_3, \nu_3)) \in T \times T \times T$, we compute the pair

$$ (s,\nu) = (s_1 + s_2 + s_3, \nu_1 \nu_2 \nu_3) $$

Call this result set $U$.

$U$ gives the distribution of the squares $s = d^2 \lt 256^2$. We should aggregate the frequency by $s$.

i.e.,

$$V = \{(s, \nu) : \nu = \sum_{(s^{\prime}, \nu^{\prime}) \in U} \nu^{\prime} \text{ where $s^{\prime} = s$ }\}$$

Since we are interested in the distribution of $d$, the positive square root of $d^2$ only, we use the same aggregate frequency. i.e., the map

$$ (s, \nu) \rightarrow (\sqrt{s}, \nu) $$

The following C# program implements the above:

public class Program
{
    public static IEnumerable<long> DiffOfSquares()
    {
        const int limit = 256;
        for(int i = 0; i < limit; i++)
        {
            for(int j = 0; j < limit; j++) {
                yield return (i - j)*(i - j);
            }
        }
    }
public static IEnumerable&lt;(long dsquared, long count)&gt; SumOf3DiffOfSquares()
{
    var q = 
        from dos in DiffOfSquares()
        group dos by dos into g
        select (dos: g.Key, count: g.Count());
    var diffOfSquares = q.ToArray();

    foreach(var t1 in diffOfSquares) {
        foreach(var t2 in diffOfSquares) {
            foreach(var t3 in diffOfSquares)
            {
                yield return (
                  dsquared: t1.dos + t2.dos + t3.dos, 
                  count: t1.count * t2.count * t3.count
                );
            }
        }
    }
}

public static void Calculate()
{
    var q = 
        from entry in SumOf3DiffOfSquares()
        group entry by entry.dsquared into g
        select (
          dsquared: g.Key, 
          count: g.Select(x =&gt; x.count).Sum()
        );

    foreach(var entry in q)
    {
        Console.WriteLine(entry);
    }
}

public static void Main(string[] args)
{
    Calculate();
}

}

vvg
  • 3,311
  • What is the output of your program? A JSON file would be nice – Rodrigo de Azevedo Jul 13 '23 at 10:55
  • The output is around 2MB text file with each tuple in a line.

    $$ \cdots \ (193051, 120) \ (190512, 512) \ (191017, 1152) \ (191524, 768) \ (191522, 864) \ (192029, 1152) \ \cdots $$

    I don't know how I can upload a file here on MSE

    – vvg Jul 13 '23 at 10:59
  • FYI. There are $139638$ entries in the output. – vvg Jul 13 '23 at 11:01
  • Yeah, it's a bit heavy. Is that pair of the form (distance, number of occurrences)? – Rodrigo de Azevedo Jul 13 '23 at 11:02
  • 1
    $(d^2, \text{frequency})$ – vvg Jul 13 '23 at 11:02
  • @vvg It says the file is deleted. – Teg Louis Jul 13 '23 at 22:51
  • Pls try this link. https://file.io/sjrMDFLb2iH3 I am not using the paid plan. So, I am not sure what their deletion policy is. You may want to grab the file soon. – vvg Jul 14 '23 at 02:06
  • Also, the logic is pretty straightforward and the algorithm runtime is $O(256^3)$ with space $O(256^2)$. So, it can be easily re-implemented in any programming language. It takes less than 30 seconds to run on my laptop. I am sorry, I can't help more on this. – vvg Jul 14 '23 at 02:10
  • @vvg You are great. Math is fun. It isn't something that needs to be pressured. So need to say sorry. – Teg Louis Jul 14 '23 at 15:39
0

My code is ugly and brutish. I am not accepting my own answer. I am going to try to improve it though in the meantime.

Directly below gives a grid table in Mathematica for the frequency of distances that are nonnegative integers,$0,1,2...$ Finding the solution for integers is easier, because it does not involve filtering empty sets.

Insert[Join[{{"d","Frequency"}},
Table[{i,Total[Map[Length,Map[Permutations,Threaded[{256,256,256}]-
PowersRepresentations[i^2, 3, 2]]]
*Apply[Times,(Threaded[{256,256,256}]-PowersRepresentations[i^2, 3, 2]),{1}]]},
    {i,1,255}]]//Grid,Alignment->Left,2]

Directly below gives a theoretical grid table in Mathematica for the frequency of distances that are surds, like $\sqrt{2},\sqrt{3},\sqrt{5}...$. For the RGB color model, there should be at most $194634$ distances. Some distances cannot be written as a sum of three cubes. Ideally, the program would not even check those instead of having to filter. This program times out in WolframCloud though.

surdDistances= Table[{n+Floor[0.5+Sqrt[n]],PowersRepresentations[n+Floor[0.5+Sqrt[n]],3,2]},{n,1,194634}];

filteredSurdDistances= Select[surdDistances, Last[#] != {} &];

Insert[Join[{{"d","Frequency"}}, Table[{Sqrt[filteredSurdDistances[[i]][[1]]], Total[Apply[Times,Threaded[{256,256,256}]-filteredSurdDistances[[i]][[2]],{1}]]}, {i,1,Length[filteredSurdDistances]}]]//Grid, Alignment->Left,2]