0

Warm-up in 3D

Let's place a sphere of radius $r$ at the origin and a smaller sphere of radius $\epsilon$ at $(0, 0, r)$ (i.e., on the intersection of the z-axis with the surface of the larger sphere).

We're interested in their intersection, so any points that jointly satisfy $$ x^2 + y^2 + z^2 = r^2, $$ and $$ x^2 + y^2 + (z-r)^2 = \epsilon^2. $$

I.e.: $$ z = r - \frac{\epsilon^2}{2r}, $$ and $$ x^2 = \epsilon^2 - \frac{\epsilon^4}{4r^2} - y^2. $$

Which suggests a way to sample points uniformly from the intersection: uniformly sample $y \in [-(\epsilon^2-\epsilon^4/4r^2), \epsilon^2 - \epsilon^4/4r^2]$, then compute the corresponding $x$ (sign given by a coin flip).

Generalization

How do we generalize this to higher dimensions, $D$? In a way where the sampling is guaranteed to be uniform over the $D-1$-dimensional intersection? What about when the smaller sphere is placed at an arbitrary point on the surface?

Bonus

I'm interested in this from the context of initializing a weight matrix for a neural network. My aim is to explore what small perturbations to the initial matrix do to the results of training.

In this case, we know that the components of the original matrix are generated i.i.d. from a uniform or normal distribution. I've modified the weight initialization scheme to guarantee that the weight matrix has a fixed norm (equal to the limiting value of Kaiming-He initialization for large matrices). Is there any way to guarantee in general that:

  1. These perturbations stay on the same hypersurface at a fixed distance (the problem above), and
  2. the components of the perturbed matrix follow from the same initialization distribution.
Jesse
  • 169

2 Answers2

1

EDIT: This is wrong. See edit below for why.

Let's generalize to an arbitrary number of dimensions $D$ and an arbitrary center for the smaller sphere $\vec x'$.

The equations become: $$ |\vec x|^2 = |\vec x'|^2 = r^2, $$ and $$ |\vec x - \vec x'|^2 = \epsilon^2. $$

Combining, we get: $$ \langle \vec x | \vec x'\rangle =r^2 - \frac{\epsilon^2}{2}. $$

If we reparametrize, $\vec x = c \hat x$, where $\hat x$ is a unit vector, the solution becomes evident: $$ c = \frac{1}{\langle \hat v|\vec v'\rangle}\left(r^2 - \frac{\epsilon^2}{2}\right). $$

The sampling strategy is as follows:

  1. Randomly sample a unit vector $\hat v$ (according to whatever sampling procedure you like).

EDIT: This is where it goes wrong. $\hat v$ isn't uniformly distributed over a unit hypersphere.

  1. Compute the scaling constant $c$, then rescale to get your new sample, $\vec v$.

Regarding the bonus: I'm still not sure how to relate the statistical properties of the components of the original vector $\vec x'$ to the new vector $\vec x$.

Jesse
  • 169
1

Ok here's the right answer.

The intersection of two $d$-dimensional hyperspheres, $S^d$, where one is centered on the other is a $d-1$ dimensional hypersphere, $S^{d-1}$.

So all you have to do is uniformly sample a point on $S^{d-1}$ then rotate it in the higher dimensional space, and shift it to the right spot.

Here's the example of $S^1$:

Intersection of two S^1

And here's the example of $S^2$:

Intersection of two S^2

Concretely, let $\vec r$ be the location of the center of the second hypersphere, which has radius $\epsilon$.

A little trig (take a look at the example of $S^1$), shows that the angle from $\vec r$ to the cone that goes through the intersection is

$$ \theta = \arccos \left(1 - \frac{\epsilon^2}{2 r^2}\right), $$

where $r = |\vec r|$.

To sample from the intersection, first we use the Muller method (thanks David) to sample a point on $S^{d-1}$ with radius $\epsilon'=r \cos \theta$. Then, we embed it in the higher-dimensional space by adding on coordinate at the end with value $0$.

Next, we have to transform this sphere to make it orthogonal to $\vec r$. For this we can use a Householder reflection to reflect the normal vector of the embedded $S^{d-1}$, $\vec z = (0, \dots, 0, 1)$ onto $\vec r$.

Finally, we translate this $S^{d-1}$ along the vector $\vec r$ so that it intersects both spheres. We have to shift it by $\vec r' = r' \hat r = r' \frac{\vec r}{r}$, where $$ r' = r \cos\theta. $$

Note:

If $\vec r$ is uniformly distributed over the hypersphere it occupies, and if we generate a perturbation uniformly over $S^{d-1}$, then our perturbed vectors will also be uniformly distributed over the original hypersphere.

Jesse
  • 169