About a density property of the Nearest Neighbor algorithm: part 2.

Question

Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space and $(\mathcal{X},d)$ be metric space. Suppose that $X,X_1,X_2,X_3,... : \Omega\to\mathcal{X}$ are $\mathbb{P}$-i.i.d. random variables.

Get a closed set $K$ of $(\mathcal{X},d)$ and $x\in\partial K$.

Suppose that: $$\exists \delta_x \in(0,1], \frac{\mathbb{P}(X\in \partial K\cap B_r (x))}{\mathbb{P}(X\in B_r (x))}\to \delta_x, r\to 0^+$$ and $$\frac{\mathbb{P}(X\in K^c\cap B_r (x))}{{\mathbb{P}(X\in B_r (x))}}\to0, r\to 0^+$$ where $B_r(x)$ is the open ball centered in $x$ of radius $r$ in $(\mathcal{X},d)$.

Define: $$\forall m\in\mathbb{N}, \pi_m^x: \mathcal{X}^m\to\{1,...,m\}, (x_1,...,x_m)\mapsto \min\left(\operatorname{argmin}_{k\in\{1,...,m\}}\left\{d\left(x,x_k\right)\right\}\right).$$

Define: $$\forall m\in\mathbb{N}, Z_m^x:\Omega\to\mathcal{X}, \omega\mapsto X_{\pi_m^x\left((X_1(\omega),...,X_m(\omega)\right)}(\omega).$$

Is it true that $\mathbb{P}(Z_m^x\in K^c)\to 0, m\to\infty?$

This is a version with stronger hypothesis of this other question that has a negative answer: About a density property of the Nearest Neighbor algorithm

I can imagine examples where $\delta_x = 1$, but I cannot think of any example where $\delta_x \in (0,1)$ strictly. Can you give me an example? — antkam, Oct 16 '19 at 01:45
Try with $\mathcal{X}=[-1,1]×[-1,1]$ with the euclidean metric and $K=[-1,0]×[-1,1]$ with $\mathbb{P}_X$ that has a uniformly distributed $1/2$ mass on $[-1,0]×{0}\cup{0}×[-1,1]$ and the other $1/2$ mass uniformly distributed in $K^c$, with $x=(0,0)$ — Bob, Oct 16 '19 at 02:53
ha! while driving home, i had almost exactly that example in my head! i was only missing the crucial $[-1,0] \times {0}$ "branch"! :) thanks! — antkam, Oct 16 '19 at 03:43
Edit 4 (which has nothing to do with $\partial K$) reinforced a nagging question I had from the beginning: why do you need the condition $P(X \in \partial K \mid X \in B_r) \to \delta > 0$? The other condition $P(X \in K^c \mid X \in B_r) \to 0$ already implies $P(X \in K \mid X \in B_r) \to 1$, and nothing suggests a point $x' \in \partial K$ would be closer to $x$ than a point $x'' \in K$, so in that sense I fail to see how the additional condition about $\partial K$ would help. Do you have an example where the conclusion fails without $P(X \in \partial K \mid X \in B_r) \to \delta > 0$? — antkam, Oct 16 '19 at 21:58
actually I had the same feeling, and this is why I posed the previous question only mentioning $K$ and $K^c$. On the other hand, I posed this question using $\partial K$ since I'm mainly interested in solving this particular problem and then I wanted to avoid asking another question with additional hypothesis if also the conclusion mentioning only $K$ and $K^c$ failed to hold as in the previous question :) — Bob, Oct 17 '19 at 06:24
I've been thinking along the lines of your Edit 4. I think it can prove this: $\forall \epsilon > 0, \exists m: P(Z_m \in K^c) < \epsilon.$ This is very good, but IMHO this isn't quite what you need yet. The stmt $P(Z_m \in K^c) \to 0$ means this slightly stronger stmt: $\forall \epsilon > 0, \exists M, \forall m > M: P(Z_m \in K^c) < \epsilon.$ And I can't prove the stronger stmt yet... :( One way to close the gap is to show $f(m) = P(Z_m \in K^c)$ is monotonically decreasing in $m$. It seems "obvious", so perhaps I am just missing something "obvious"? — antkam, Oct 18 '19 at 14:54
maybe something along the line of the edit 1 could be helpful... I'll think on it tomorrow — Bob, Oct 18 '19 at 16:22
Actually I came to the same conclusion: via a modification of the argument given in edit 4 I can prove that there exists a subsequence $(m_k){k\in\mathbb{N}}$ such that $\mathbb{P}(Z^x{m_k}\in K^c)\to 0, k\to\infty$ but I'm in trouble proving that the result holds for the whole sequence... probably I'm going to edit for the fifth time proving that result and then try with a bounty — Bob, Oct 26 '19 at 11:19
@antkam this is a long question. is it interesting / worth my time? — mathworker21, Oct 26 '19 at 15:04
@mathworker21 - Ha, I'd be happy to rope you in! Yes I thought this was an interesting question, and tricky too because it's hard to reason about $\min$. It looks long, but the main conjecture is pretty intuitive, and the length is mainly due to Bob showing his work (which is great!) and me occasionally chiming in in the chat. It's one of those conjectures that I think should be true, but I also feel I'm lacking some non-trivial theorem that would prove it. Given your much deeper math background, maybe you can find a proof (or better yet, find a counter-example which would be delightful!) — antkam, Oct 26 '19 at 19:48
@Bob Could the following be true? Let $\mathcal{X}$ be an arbitrary metric space and $x \in \mathcal{X}$. Let $E \subseteq \mathcal{X}$ be an arbitrary subset. Let $X_1,X_2,\dots : \Omega \to \mathcal{X}$ be i.i.d., and let $Z_m$ be nearest neighbor map w.r.t. $x$. Suppose $P(Z_m \in E) \to 0$ for a subsequence of $m$. Then $P(Z_m \in E) \to 0$ as $m \to \infty$. — mathworker21, Oct 26 '19 at 22:00
It seems plausible... however I'm not quite sure about how to attack it — Bob, Oct 26 '19 at 22:08
I'm afraid of oscillating behaviour: the greater is $m$, the closer we can get to $x$... however, it could be the case that in some range of the radius the probability $X$ being in $K^c$ dominates the probability of $X$ being in $K$ and so we get an oscillating behaviour in $m$... maybe the condition $\mathbb{P}(X\in K^c \cap B_r(x)) / \mathbb{P}(X\in B_r(x))\to 0, r\to 0^+$ could help us ruling out such oscillations... — Bob, Oct 26 '19 at 22:35
@antkam we actually don't have monotonicity in $m$. The easiest way to see that we don't is $E = {\frac{1}{1000} < |z| < 1} \cup {|z| < \frac{1}{100^{100}}}$ inside the unit disk in $\mathbb{R}^2$. Then, for $x$ the origin, and points chosen uniformly at random, $P(Z^x_m \in E)$ is large for small $m$, small for medium-sized $m$, and then large for very very large $m$. See the end of my answer for a generalization. — mathworker21, Oct 27 '19 at 00:07
@antkam we need your help. we have the result if $\mu$ or $K$ is nice we're pretty sure. — mathworker21, Oct 29 '19 at 11:22
@mathworker21 and Bob - hi guys! wow you two have had a ton of updates, and honestly i can't yet follow a lot of the details... BUT question: did you prove it or not? the posted Answer has a "Theorem" with a $\square$ at the end... so does that mean (you think) you have proved it? if so, what remains to be done? or did the first section "critical obstruction" invalidate the proof technique used for the Theorem? or perhaps you proved it under further assumptions (some mention of smoothness...)? — antkam, Oct 29 '19 at 21:26
@antkam we did not prove it. we proved it if $\mu$ is smooth. the critical obstruction does indeed invalidate the proof technique attempted in edit 5 and my "proof" below the second fold. that "proof" assumed $\mu$ was smooth. so what remains to be done is to prove it for general $\mu$. the $\mu$ given in the critical obstruction is not smooth. I'd recommend reading the critical obstruction part of my answer below. I think (perhaps biasedly, though I don't think so) that it is very important/relevant. — mathworker21, Oct 29 '19 at 21:38
What will be probably the final blow is in edit 7. If I find some difficulty maybe I'll ask help here or in another question to fix a particular problem. — Bob, Oct 30 '19 at 11:10
@Bob Is Lemma 2 written correctly? you want $K$ and $K^c$? not two $K$'s? — mathworker21, Oct 30 '19 at 13:41
sure, there's a lot of this type of typo :) all the $K$ are $K^c$ — Bob, Oct 30 '19 at 14:42
@Bob I didn't get a ping for your two most recent comments. Is your answer (that you posted as an answer) a complete proof? If so, I'll try to read it within 24 hours. — mathworker21, Oct 31 '19 at 15:20
Yes, i think so... hoping i've not fallen in some tra somewhere... this evening i'll check it one time more... :) — Bob, Oct 31 '19 at 17:59
@Bob please tag me in your comments so that I can see that you've replied! I'll look at your answer — mathworker21, Oct 31 '19 at 18:22
@mathworker21 edited to correct some typo here and there and also to improve lemma 2 (previously I claimed that I restrict the proof to the discrete case leaving the general case to a messy generalization of the same argument, but actually there was no reason to do that). To my best knowledge, I believe it works, at least unless I fell asleep somewhere... — Bob, Oct 31 '19 at 19:53
I deleted all the edits because (1) they were making the page much more clustered and difficult to load; (2) you have an answer below (though I'm pretty sure it's wrong); (3) the edits are still there in the history if you want to retrieve them. feel free to put them all back if you disagree with me — mathworker21, Nov 03 '19 at 20:57
It seems that this is the end of this journey... I want to thank you both guys (@mathworker21 and @antkam), I very much appreciated the time you dedicated to this and the linked question. I had fun, hope you had fun too, and... to meet again on the next problem :) — Bob, Nov 04 '19 at 00:18

mathworker21 · Answer 1 · 2019-10-29T12:26:20.870

1

Here's a critical obstruction to many methods of proof.

Let's say we're in $\mathbb{R}^2$, $x$ is the origin, and $\mu$ is supported on countably many concentric circles around the origin. The radii of these circles are $r_n = 10^{10^{-n}}$, and $\mu$ has total mass $2\pi r_n$ on each and is uniformly distributed (Lebesgue measure) on each. (I guess you have to normalize $\mu$ to be a probability measure, so just do that). Let $(K^c)^{(n)}$ be open an arc of proportion $\frac{1}{n}$ on the circle of radius $r_n$, and let $K^c = \cup_{n \ge 1} (K^c)^{(n)}$. Then for all intermediate values of $m$, i.e., those for which it is very probably that at least one point will be on the circle with radius $r_n$ but very improbably that at least one point will be on the circle with radius $r_{n+1}$, the reason that $Z^x_m$ will be in $K$ will not be because it is unlikely that some point will be in $K^c$, but rather, because the first point chosen on the circle with radius $r_n$ will, with probability going to $1$, not be in $K^c$.

My point is that you definitely have to use that $Z^x_m$ cares about the first closest point. It seems that none of your attempts utilize this fact. For example, with respect to edit 5 / the answer below, it will actually be true that $P(\cup_{j=1}^m X_j \in K^c \cap B_r(x))$ is large for any of the smart choices of $r$.

If we change definitions and say that "$Z^x_m$ is in $K$ if at least one of the closest points to $x$ is in $K$", then it will be false that $P(Z^x_m) \to 0$ as $m \to \infty$.

Actually, you have done all of the hard work (in edit 5). It seems that you are saying that once you have chosen $s_1 > s_2 > \dots$, it might not be the case that the intervals $(I_j)_j$ overlap enough. This could be true, but why choose $s_1 > s_2 > \dots$ arbitrarily to begin with? You can just consider all $r > 0$ at once. I provide a proof below (which will also make me make sure that your main argument in edit 5 is correct). Below the second fold, I will give some more comments about this problem, to make myself not feel bad about getting the bounty (if I didn't make any mistakes about anything).

I will use $\mu$ to denote the measure induced on $\mathcal{X}$ by $P$ (i.e. $\mu(E) = P(X \in E)$).

Theorem: Let $(\mathcal{X},d)$ be a metric space and $\mu$ be a probability distribution on $\mathcal{X}$. Take $x \in \mathcal{X}$ with $\mu(B_r(x)) > 0$ for each $r > 0$. Suppose that $\lim_{r \downarrow 0} \frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))} = 0$. Then, $P(Z^x_m \in K^c) \to 0$ as $m \to \infty$, where $Z^x_m$ is the nearest neighbor function relative to $x$ (with $m$ samples).

Proof: We can, of course, assume $\mu(K^c \cap B_r(x)) > 0$ for each $r > 0$. Take $\epsilon > 0$. Take $r_0 > 0$ so that $\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))} < \epsilon$ for each $0 < r < r_0$. The map $(\frac{1}{3},\frac{2}{3})\times(0,r_0) \to (0,\infty)$, $(\alpha,r) \mapsto \frac{1}{\mu(B_r(x))}\frac{1}{[\frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))}]^\alpha}$ is continuous; its image has the form $(\eta,\infty)$, since $r$ going to $0$ makes the function blow up. Take any $m \in \mathbb{N}$ with $m \ge \eta$. Take $\alpha \in (\frac{1}{3},\frac{2}{3})$ and $0 < r < r_0$ with $m = \frac{1}{\mu(B_r(x))}\frac{1}{[\frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))}]^\alpha}$. Then, $$P(Z^x_m \in K^c \cap B_r(x)) \le P(\cup_{j=1}^m X_j \in K^c \cap B_r(x)) \le \sum_{j=1}^m P(X_j \in K^c \cap B_r(x))$$ $$= m\mu(K^c \cap B_r(x)) = (\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))})^{1-\alpha} \le \epsilon^{1/3}.$$ And, $$P(Z^x_m \in B_r(x))^c = \mu(B_r(x)^c)^m = (1-\mu(B_r(x)))^m \le \exp(-m\mu(B_r(x)))$$ $$= \exp(-(\frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))})^{-\alpha}) \le \exp(-\epsilon^{-1/3}).$$ Therefore, $$P(Z^x_m \in K^c) \le P(Z^x_m \in K^c \cap B_r(x))+P(Z^x_m \in B_r(x))^c) \le \epsilon^{1/3}+\exp(-\epsilon^{-1/3}),$$ which can be made arbitrarily small by taking $\epsilon$ arbitrarily small. $\square$

Claim: The theorem above is false if we only insist that $\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))}$ goes to $0$ along a subsequence of $r$.

Proof: Take $\mathcal{X} = \{z \in \mathbb{R}^2 : |z| < 1\}$, $d$ to be standard Euclidean metric, and $\mu$ to be Lebesgue measure (uniform distribution). Let $K^c = \cup_{k \ge 1} \{\frac{1}{2^{10^{2k+1}}} < |z| < \frac{1}{2^{10^{2k}}}\}$. Clearly, for $x$ the origin, $\frac{\mu(K^c \cap B_r(x))}{\mu(B_r(x))}$ goes to $0$ along $(r_k)_k = (2^{10^{2k+1}})_{k \ge 1}$. Finally, $$P(|Z^x_m| \in [\frac{1}{2^{10^{2k+1}}},\frac{1}{2^{10^{2k}}}]) \ge 1-P(|Z^x_m| < \frac{1}{2^{10^{2k+1}}})-P(|Z^x_m| > \frac{1}{2^{10^{2k}}})$$ $$\ge 1-m\frac{\pi}{(2^{10^{2k+1}})^2}-(1-\frac{\pi}{(2^{10^{2k}})^2})^m,$$ so we can take $m = 2^{3\cdot 10^{2k}}$ to see that $P(Z^x_m)$ is exponentially close to $1$. $\square$

edited Oct 29 '19 at 12:26

answered Oct 27 '19 at 00:03

mathworker21

34,399

Thx from the answer :) I proceed in exactly the same way after the sequence approach, but then I thought that the 2 variables function could very well be discontinuous... can I ask you why this function is continuous? (what about if $\mu$ has jumps due to a discrete component?) – Bob Oct 27 '19 at 05:52
@Bob I really like this nearest neighbor algorithm stuff :). In any event, I think you are right that the function need not be continuous, though usually people are fine assuming measures have no discrete components. Here is what I propose. First imagine we are in $\mathbb{R^n}$ and we have a measure with discrete components. There can be at most countably many "jumps" in the measure. At each of these jumps, we make $\mu$ smooth by adding a small amount of measure near it (e.g. if $\mu$ contains a delta mass at $x_0$, we replace $\delta_{x_0}$ with – mathworker21 Oct 27 '19 at 06:49
a non-negative $C^\infty$ function that is always between $0$ and $1$, is $1$ at $x_0$, and has support only very close to $x_0$). We can do this so that we still have $\lim_{r \downarrow 0} \frac{\mu(K^c\cap B_r(x))}{\mu(B_r(x))} = 0$. Since we have the result for smooth $\mu$, we know that $P(Z^x_m \in K^c) \to 0$ as $m \to \infty$. Since $K^c$ just received more mass throughout this process, this implies that $P(Z^x_m \in K^c) \to 0$ for the original $\mu$. If we are not in $\mathbb{R}^n$ and have too few points to be able to "smooth out" $\mu$, we can add some points to the metric space or – mathworker21 Oct 27 '19 at 06:50
isometrically embed it into a bigger space, or something like that. Let me know if all this sounds reasonable to you. If so, I will try to make it rigorous and add it to the answer. – mathworker21 Oct 27 '19 at 06:50
I think that this will be too long for comments... can we move in chat? – Bob Oct 27 '19 at 07:03
https://chat.stackexchange.com/rooms/100320/room-for-bob-and-mathworker21 – mathworker21 Oct 27 '19 at 07:34
Let me know if I’m interpreting well your train of thoughts. First, assume that $\Omega = \mathbb{R}^d \times \prod_{k=1}^{+\infty} \mathbb{R}^d$ and that we have $\mathbb{P} = \mu \otimes \bigotimes_{k=1}^{+\infty} \mu$. Now, get the Lebesgue measure $\sigma$ on $\mathbb{R}^d$ and the standard mollified of $\mathbb{R}^d$, say $\eta_\varepsilon$. Now, define $\mu_\varepsilon:=\mu*{\sigma}\eta\varepsilon$ and define $\mathbb{P}\varepsilon :=\mu{\varepsilon} \otimes \bigotimes_{k=1}^{+\infty} \mu_{\varepsilon}$. – Bob Oct 27 '19 at 08:13
Then for each $\varepsilon>0$ we have that $\mathbb{P}_{\varepsilon}(Z_m^x\in K^c)\to 0, m\to\infty$. From this, we want to conclude that $\mathbb{P}(Z_m^x\in K^c)\to 0, m \to \infty$. Did I get it what the point is? – Bob Oct 27 '19 at 08:13
@Bob Maybe. That definitely seems like a viable strategy. I was thinking maybe to replace $K^c$ with $supp(1_{K^c}*\eta_\epsilon)$ or something like that. Then, a single choice of $\epsilon$ will be enough, I think. I'll come back to this soon. – mathworker21 Oct 27 '19 at 08:27
Suppose that $K$ is something like a generalized Cantor set in $[0,1]$. Then $\operatorname{supp}(1_{K^c}*\eta_\varepsilon)$ cover the entire set $[0,1]$ no matter small we pick $\varepsilon$... Isn't this a problem? – Bob Oct 27 '19 at 14:21
I'm thinking... your counter-example actually doesn't show that we can't expect the result to hold? It seems that choosing $\mathbb{P}X$ and $K^c$ as you did and defining $m_k:=2^{3\cdot 10^{2k}}$ we have that $\mathbb{P}(Z^x{m_k}\in K^c) \ge \mathbb{P}(|Z^x_{m_k}|\in [\frac{1}{2^{10^{2k+1}}},\frac{1}{2^{10^{2k}}}])\to 1, k\to+\infty$... Am I wrong? – Bob Oct 27 '19 at 17:52
Nah, I misunderstood your counterexample. Also I'm losing hope in the fact that approximating approach could work: it seems to me that we need something like setwise convergence, something that is hard to achieve – Bob Oct 29 '19 at 08:53
@Bob hi. So we have the result if either (1) $\mu$ is smooth, or (2) the measures of $\epsilon$ neighborhoods of $K$ converges to the measure of $K$ as $\epsilon$ goes to $0$? and you're still not satisfied?? :) regardless, good catch on the generalized cantor set example – mathworker21 Oct 29 '19 at 11:22
The fact is that I want to use this result as a (final) lemma in a big work where I want to assume few hypotheses as possible since I know that the result I want to prove using this lemma is true via another path without further assumptions... so there's little point to present another proof if it proves less :) – Bob Oct 29 '19 at 11:35
... and in the case of interest where I want to use this lemma, usually $K$ is very bad... – Bob Oct 29 '19 at 11:44
@Bob not to sound arrogant, but I think I added something very important to consider to my answer just now. please see it. it gives an example of failure for the method of edit 5 and other methods. – mathworker21 Oct 29 '19 at 12:06
I don't get the sentence "it will actually be true that $μ(K^c∩B_r(x))$ is large for any of the smart choices of r". We have in our hypothesis that $\frac{μ(K^c∩B_r(x))}{μ(B_r(x))}\to 0, r\to 0^+$, how could be $μ(K^c∩B_r(x))$ be large with respect to $μ(K∩B_r(x))$? – Bob Oct 29 '19 at 12:23
@Bob thanks, edited/fixed. the point is that, for those intermediate values of $m$, with high probability, some point will be in that $\frac{1}{n}$-arc, but with high probability, it won't be the first point. I hope the idea is conveyed. – mathworker21 Oct 29 '19 at 12:27
yep, basically the first point in $\partial B_{r_n} (x)$ has a probability of $1/n$ to be in $K^c$... i.e. it seems something like what I claimed in edit 6, i.e. that $\mathbb{P}(Z_m^x \in K^c | Z_m^x \in \partial B_r(x)) = \mathbb{P}(X\in K^c | X \in \partial B_r(x))$ seems to hold true... why should that be an obstruction? – Bob Oct 29 '19 at 12:34
@Bob i haven't studied edit 6 yet. but the exampled just added definitely is an obstruction to edit 5 and probably the edits before it also. do you agree? – mathworker21 Oct 29 '19 at 12:36
Let's see if I got your point. Say that we choose $\mathbb{P}(X\in K^c | X\in\partial B_{r_n}(x))$ to be 1. We are bounded to $\frac{\mathbb{P}(X\in K^c \cap B_r(x))}{\mathbb{P}(X\in B_r(x))} \to 0, r\to 0^+$ and you are claiming that we can choose $\mathbb{P}(X\in\partial B_{r_n}(x))$ big enough to obtaining that we can select a sequence $m_n\uparrow +\infty$ such that $\mathbb{P}(Z^x_{m_n}\in K^c)$ doesn't converge to $0$ for $n\to\infty$... did I get your point? – Bob Oct 29 '19 at 12:42
@Bob I don't know where you got "we choose $P(X \in K^c | X \in \partial B_{r_n}(x))$ to be $1$" from. What I said is that the probability that the closest $X_i$ to the origin is in $\partial B_{r_n}$ is nearly $1$. I previously edited my answer to explain things very clearly. I really don't know what else to say. I took my time to make sure the edit explained things very clearly and I think it does. – mathworker21 Oct 29 '19 at 15:58
probably I didn't get your point, I thought that you are suggesting some kind of counterexample that could invalidate the proof line in the discontinuous case. I'll read with more care your edit. – Bob Oct 29 '19 at 16:05
@Bob sorry, I'm just in a really bad mood (due to other things). I don't mean to be rude. I am suggesting an example that invalidates the whole approach of edit 5 (and probably many other approaches suggested). – mathworker21 Oct 29 '19 at 16:08
@Bob hi bob. is it clear now? if not, ill try my best to make it more clear. – mathworker21 Oct 29 '19 at 21:42
I haven't the time to review it again since I invested my time working on another strategy until now... – Bob Oct 30 '19 at 11:12

Bob · Accepted Answer · 2019-10-31T19:57:25.633

It turns out that the strategy outlined in edit 7 actually works without further assumptions (Lemma 0 takes care of what was missing in edit 7). Details follow.

Proposition 1. If $\mathbb{P}(X=x)>0$ then $\mathbb{P}(Z_m^x\in K^c)\to0, m\to+\infty$

Proof. Since $x\notin K$ we have that $$\mathbb{P}(Z_m^x\in K^c)\le \mathbb{P}\left(\bigcap_{k=1}^m X_k\neq x\right) = \left(1-\mathbb{P}(X=x)\right)^m \to 0, m\to\infty.$$

Proposition 2. If there exists $r>0$ such that $\mathbb{P}(X\in K^c \cap B_r(x))=0$ then $\mathbb{P}(Z_m^x\in K^c)\to0, m\to+\infty$.

Proof. Since $\mathbb{P}(X\in B_r(x))>0$ and $\mathbb{P}(X\in K^c \cap B_r(x))=0$ we have that: $$\mathbb{P}(Z_m^x\in K^c)\le \mathbb{P}\left(\bigcap_{k=1}^m X_k\notin B_r(x)\right) = \left(1-\mathbb{P}(X\in B_r(x))\right)^m \to 0, m\to\infty.$$

Thanks to Proposition 1 and Proposition 2, we can (and will) assume from now on that $\mathbb{P}(X=x)=0$ and $\forall r>0, \mathbb{P}(X\in K^c \cap B_r(x))>0$

Lemma 0. $\lim_{r\to0^+} \frac{\mathbb{P}(X\in K^c\cap \bar B_r(x))}{\mathbb{P}( X\in \bar B_r(x))} =0.$

Proof. We know that $\lim_{r\to0^+} \frac{\mathbb{P}(X\in K^c\cap B_r(x))}{\mathbb{P}(X\in B_r(x))} =0.$ Since the set $$\mathcal{Q}:=\{r>0 : \left(\mathbb{P}(X\in K^c\cap \partial B_r(x))>0\right) \lor \left(\mathbb{P}(X\in \partial B_r(x))>0\right) \}$$ is countable, and since $$\frac{\mathbb{P}(X\in K^c\cap B_s(x))}{\mathbb{P}(X\in B_s(x))}\to\frac{\mathbb{P}(X\in K^c\cap \bar B_r(x))}{\mathbb{P}(X\in \bar B_r(x))}, s\downarrow r$$ for each sequence $r_n \downarrow 0$ we can select $s_n>r_n$ such that $s_n \downarrow 0$ and $$\forall n\in\mathbb{N}, \left|\frac{\mathbb{P}(X\in K^c\cap B_{s_n}(x))}{\mathbb{P}(X\in B_{s_n}(x))}-\frac{\mathbb{P}(X\in K^c\cap \bar B_{r_n}(x))}{\mathbb{P}(X\in \bar B_{r_n}(x))}\right|\le \frac{1}{n}.$$ Then $$\frac{\mathbb{P}(X\in K^c\cap \bar B_{r_n}(x))}{\mathbb{P}(X\in \bar B_{r_n}(x))}\le \frac{1}{n}+\frac{\mathbb{P}(X\in K^c\cap B_{s_n}(x))}{\mathbb{P}(X\in B_{s_n}(x))}\to 0, n\to+\infty.$$ Since $(r_n)_{n\in\mathbb{N}}$ was arbitrary, Lemma 0 follows.

Lemma 1. There exist $M>0$ and $(r_m)_{m\in\mathbb{N}, m\ge M}$ such that $0<r_m\to0, m\to+\infty$ and $$\forall m\in\mathbb{N}, (m\ge M)\implies \frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r_m}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r_m}(x))}} \\ \le m \le \frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap B_{r_m}(x))}\sqrt{\mathbb{P}(X \in B_{r_m}(x))}}.$$

Proof. Since $0<\mathbb{P}(X\in \bar B_r)\downarrow \mathbb{P}(X= x) =0, r\downarrow0$ and $\forall r>0, \mathbb{P}(X\in K^c \cap \bar B_r(x))>0$ we have that for each $r>0$ it is well defined $$\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r}(x))}}$$ and increases to $+\infty$ as $r\downarrow 0$.

Define $M:=\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{1}(x))}\sqrt{\mathbb{P}(X \in \bar B_{1}(x))}}.$

Now, get $m\in\mathbb{N}$ such that $m\ge M$. Define $$r_m:= \sup\left\{r>0 : \frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r}(x))}}\ge m\right\}.$$ Notice that for each $r>r_m$ we have that $$\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r}(x))}}<m$$ and since $$\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r}(x))}} \uparrow \frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r_m}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r_m}(x))}}, r\downarrow r_m$$ we have that $$\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r_m}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r_m}(x))}} \le m.$$ On the other hand, for each $r\in(0,r_m)$ we have that $$\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r}(x))}}\ge m$$ and since $$\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap \bar B_{r}(x))}\sqrt{\mathbb{P}(X \in \bar B_{r}(x))}} \downarrow \frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap B_{r_m}(x))}\sqrt{\mathbb{P}(X \in B_{r_m}(x))}}, r\uparrow r_m$$ we have that $$\frac{1}{\sqrt{\mathbb{P}(X \in K^c\cap B_{r_m}(x))}\sqrt{\mathbb{P}(X \in B_{r_m}(x))}}\ge m.$$ Since $m$ was arbitrary, Lemma 1 follows.

Notation. From now on, $M>0$ and $(r_m)_{m\in\mathbb{N}, m\ge M}$ will be as in Lemma 1 and, to get the notation simpler, we assume that $M=1$ (since we are interested in the behavior for $m$ big this isn't a problem) so that $(r_m)_{m\in\mathbb{N}, m\ge M} = (r_m)_{m\in\mathbb{N}}.$

Proposition 3. $\mathbb{P}(Z_m^x\in K^c \cap B_{r_m}(x)) \to 0, m\to+\infty.$

Proof. $$\mathbb{P}(Z_{m}^x \in K^c \cap B_{r_m}(x))\le \mathbb{P}\left(\bigcup_{j=1}^{m} X_j \in K^c \cap B_{r_m}(x)\right) \le \sum_{j=1}^{m} \mathbb{P}(X_j\in K^c\cap B_{r_m}(x)) \\ = m\mathbb{P}(X\in K^c\cap B_{r_m}(x)) \le \left(\frac{\mathbb{P}(X\in K^c\cap B_{r_m}(x))}{\mathbb{P}(X\in B_{r_m}(x))}\right)^{1/2} \to 0, m\to\infty$$

Proposition 4. $\mathbb{P}(Z_m^x\in (\bar B_{r_m}(x))^c) \to 0, m\to+\infty.$

Proof. $$\mathbb{P}(Z_{m}^x \in (\bar B_{r_m}(x))^c)\le \mathbb{P}\left(\bigcap_{j=1}^{m} X_j \notin \bar B_{r_m}(x)\right) = \prod_{j=1}^{m} \mathbb{P} (X_j \notin \bar B_{r_m}(x)) \\ = \left(1-\mathbb{P}(X\in \bar B_{r_m}(x))\right)^{m} \le \exp(-m \mathbb{P}(X\in \bar B_{r_m}(x))) \\ \le \exp\left(-\left(\frac{\mathbb{P}(X\in \bar B_{r_m}(x))}{\mathbb{P}(X\in K^c \cap \bar B_{r_m}(x))}\right)^{1/2} \right) \to 0, m\to +\infty.$$

Lemma 2. Suppose that $r>0$ and $m\in\mathbb{N}$ are such that $\mathbb{P}(Z_m^x\in\partial B_{r}(x))>0$. Then $$\mathbb{P}(Z_m^x\in K^c | Z_m^x\in\partial B_r(x)) = \mathbb{P}(X\in K^c | X\in \partial B_r(x))$$ and so in particular $$\mathbb{P}(Z_m^x\in K^c \cap \partial B_r(x)) \le \mathbb{P}(X\in K^c | X\in \partial B_r(x)).$$

Proof. Define: $$R:=d(x,X),R_1:=d(x,X_1),...,R_m:=d(x,X_m)$$ and notice that $R,R_1,...,R_m$ are i.i.d. and that $\mathbb{P}(R=r)>0$. Define: $$\forall m\in\mathbb{N}, \sigma_m^x: [0,+\infty)^m\to\{1,...,m\}, (r_1,...,r_m)\mapsto \min\left(\operatorname{argmin}_{k\in\{1,...,m\}}\left\{r_k\right\}\right).$$ Then: $$\pi_m ^x(X_1,...,X_m)= \sigma_m^x(R_1,...,R_m).$$ So: $$\mathbb{P}(Z_m^x\in K^c | Z_m^x\in\partial B_r(x)) = \frac{ \mathbb{P}(Z_m^x\in K^c \cap Z_m^x\in\partial B_r(x))}{\mathbb{P}(Z_m^x\in\partial B_r(x))} \\ = \frac{ \sum_{k=1}^m\mathbb{P}(Z_m^x\in K^c \cap Z_m^x\in\partial B_r(x) \cap \sigma_m^x(R_1,...,R_m) = k )}{\mathbb{P}(Z_m^x\in\partial B_r(x))} \\ = \frac{ \sum_{k=1}^m\mathbb{P}(X_k\in (K^c\cap \partial B_r(x)) \cap \sigma_m^x(R_1,...,R_{k-1},r,R_{k+1},...,R_m) = k )}{\mathbb{P}(Z_m^x\in\partial B_r(x))} \\ = \frac{ \sum_{k=1}^m\mathbb{P}(X_k\in (K^c\cap \partial B_r(x)))\mathbb{P}(\sigma_m^x(R_1,...,R_{k-1},r,R_{k+1},...,R_m) = k )}{\mathbb{P}(Z_m^x\in\partial B_r(x))} \\ = \frac{ \sum_{k=1}^m\mathbb{P}(X\in K^c | X\in \partial B_r(x))\mathbb{P}(R_k=r)\mathbb{P}(\sigma_m^x(R_1,...,R_{k-1},r,R_{k+1},...,R_m) = k )}{\mathbb{P}(Z_m^x\in\partial B_r(x))} \\ = \frac{ \sum_{k=1}^m\mathbb{P}(X\in K^c | X\in \partial B_r(x))\mathbb{P}(\sigma_m^x(R_1,...,R_m) = k \cap R_k=r)}{\mathbb{P}(Z_m^x\in\partial B_r(x))} \\ = \frac{ \sum_{k=1}^m\mathbb{P}(X\in K^c | X\in \partial B_r(x))\mathbb{P}(\sigma_m^x(R_1,...,R_m) = k \cap Z_m^x\in \partial B_r(x))}{\mathbb{P}(Z_m^x\in\partial B_r(x))} = \\ = \mathbb{P}(X\in K^c | X\in \partial B_r(x)) \frac{ \sum_{k=1}^m\mathbb{P}(\sigma_m^x(R_1,...,R_m) = k \cap Z_m^x\in \partial B_r(x))}{\mathbb{P}(Z_m^x\in\partial B_r(x))} \\ = \mathbb{P}(X\in K^c | X\in \partial B_r(x)).$$ Finally: $$\mathbb{P}(Z_m^x\in K^c \cap \partial B_r(x)) = \mathbb{P}(Z_m^x\in K^c | Z_m^x\in\partial B_r(x))\mathbb{P}(Z_m^x\in\partial B_r(x)) \\ = \mathbb{P}(X\in K^c | X\in \partial B_r(x)) \mathbb{P}(Z_m^x\in\partial B_r(x)) \le \mathbb{P}(X\in K^c | X\in \partial B_r(x)).$$

Lemma 3. If $m:\mathbb{N}\to \mathbb{N}$ is strictly increasing and such that $$\exists \varepsilon >0, \forall k \in \mathbb{N}, \mathbb{P}(X\in K^c | X\in \partial B_{r_{m_k}}(x))\ge \varepsilon$$ then $$\frac{\mathbb{P}(X\in\partial B_{r_{m_k}}(x))}{\mathbb{P}(X\in B_{r_{m_k}}(x))} \to 0, k\to \infty$$ and so $$\mathbb{P}(Z_{m_k}^x \in \partial B_{r_{m_k}})\to 0, k\to +\infty$$

Proof. We have that $$ \varepsilon \mathbb{P}(X\in\partial B_{r_{m_k}}(x)) \le \mathbb{P}(X\in K | X\in\partial B_{r_{m_k}}(x)) \mathbb{P}(X\in\partial B_{r_{m_k}}(x)) = \mathbb{P}(X\in K \cap \partial B_{r_{m_k}}(x)) $$ and so $$\frac{\mathbb{P}(X\in\partial B_{r_{m_k}}(x))}{\mathbb{P}(X\in \bar B_{r_{m_k}}(x))} \le\frac{1}{\varepsilon} \frac{\mathbb{P}(X\in K\cap\partial B_{r_{m_k}}(x))}{\mathbb{P}(X\in \bar B_{r_{m_k}}(x))} \le \frac{1}{\varepsilon} \frac{\mathbb{P}(X\in K\cap\bar B_{r_{m_k}}(x))}{\mathbb{P}(X\in \bar B_{r_{m_k}}(x))} \to 0 , k\to \infty$$ and since $$\frac{\mathbb{P}(X\in\partial B_{r_{m_k}}(x))}{\mathbb{P}(X\in \bar B_{r_{m_k}}(x))} = \frac{\mathbb{P}(X\in\partial B_{r_{m_k}}(x))}{\mathbb{P}(X\in B_{r_{m_k}}(x))+\mathbb{P}(X\in \partial B_{r_{m_k}}(x))}$$ it is also clear that $$\gamma(k):=\frac{\mathbb{P}(X\in\partial B_{r_{m_k}}(x))}{\mathbb{P}(X\in B_{r_{m_k}}(x))}\to 0, k\to\infty.$$ Now: $$\mathbb{P}(Z^x_{m_k}\in\partial B_{r_{m_k}}(x)) \le \mathbb{P}\left(\bigcup_{j=1}^{m_k}\left(X_j\in\partial B_{r_{m_k}}(x) \cap \bigcap_{i=1, i\neq j} ^{m_k} X_i\in (B_{r_{m_k}}(x))^c\right)\right)\\ \le m_k\mathbb{P}(X\in\partial B_{r_{m_k}}(x))(1-\mathbb{P}(X\in B_{r_{m_k}}(x)))^{m_k-1}=(*)$$ and: $$(1-\mathbb{P}(X\in B_{r_{m_k}}(x)))^{m_k-1} \le \exp(-(m_k-1)\mathbb{P}(X\in B_{r_{m_k}}(x))) \\ = \exp\left(-\frac{m_k-1}{\gamma(k)}\mathbb{P}(X\in \partial B_{r_{m_k}}(x))\right) $$ so: $$(*)=m_k\mathbb{P}(X\in\partial B_{r_{m_k}}(x)) \exp\left(-\frac{m_k-1}{\gamma(k)}\mathbb{P}(X\in \partial B_{r_{m_k}}(x))\right) \to 0, k\to+\infty.$$

Lemma 4. Let $m:\mathbb{N}\to \mathbb{N}$ be strictly increasing. Suppose that $$\mathbb{P}(X\in K^c | X\in \partial B_{r_m}(x)) \nrightarrow 0, m\to\infty.$$ Then there exists a strictly increasing function $k:\mathbb{N}\to \mathbb{N}$ such that $$\mathbb{P}(Z_{m_{k_j}}^x\in K^c \cap \partial B_{r_{m_{k_j}}}(x)) \to 0, j\to \infty $$ Proof: easy consequence of Lemma 3.

Proposition 5. $\mathbb{P}(Z_m^x\in K^c \cap \partial B_{r_m}(x)) \to 0, m\to+\infty.$

Proof. If for just a finite number of indexes $m$ we have that $\mathbb{P}(Z_m^x\in K^c \cap \partial B_{r_m}(x))>0$ there's nothing to prove. Otherwise, get all the indexes for which $\mathbb{P}(Z_m^x\in K^c \cap \partial B_{r_m}(x))>0$ and organize them in an strictly increasing sequence. Now, get a subsequence of this sequence, say $(m_k)_{k\in\mathbb{N}}$. If $\mathbb{P}(X\in K^c | X\in \partial B_{r_{m_k}}(x))\to0, k\to\infty$ by Lemma 2 we are done. Otherwise, by Lemma 4 we can find a sub-subsequence $(m_{k_j})_{j\in\mathbb{N}}$ such that $$\mathbb{P}(Z_{m_{k_j}}^x\in K^c \cap \partial B_{r_{m_{k_j}}}(x)) \to 0, j\to \infty.$$ Since every subsequence has a subsequence that converges to zero, we are done.

Theorem. $\mathbb{P}(Z_m^x\in K^c) \to 0, m\to+\infty.$

Proof. Thanks to Proposition 3, 4 and 5 we have that $$\mathbb{P}(Z_m^x\in K^c)\\ \le \mathbb{P}(Z_m^x\in K^c \cap B_{r_m}(x)) + \mathbb{P}(Z_m^x\in K^c \cap \partial B_{r_m}(x)) + \mathbb{P}(Z_m^x\in (\bar B_{r_m}(x))^c) \to 0 , m\to +\infty.$$

Well, it is just the continuity from above (see: https://math.stackexchange.com/questions/234292/continuity-from-below-and-above ) of the measure $\mathbb{P}X$: $\mathbb{P}_X(E_n) \downarrow \mathbb{P}_X(E)$ if $...\subset E_n\subset E{n-1} \subset ... \subset E_1$ and $\bigcap_{n\in\mathbb{N}}E_n = E$... — Bob, Nov 03 '19 at 22:16
sorry, I was completely wrong. I think Lemma $1$ might be right. I'll continue reading. Sorry about that — mathworker21, Nov 03 '19 at 22:41
hi. looking good so far. however, I'm confused why at the end of the proof of Lemma 3: $m_k\mathbb{P}(X\in\partial B_{r_{m_k}}(x)) \exp\left(-\frac{m_k-1}{\gamma(k)}\mathbb{P}(X\in \partial B_{r_{m_k}}(x))\right) \to 0, k\to+\infty$. can you explain please? — mathworker21, Nov 03 '19 at 22:59
well, the idea is that it is like having to deal with $a_n\cdot \exp (-b_n\cdot a_n)$ where $b_n\to+\infty$ and $a_n, b_n\ge 0$. The exp term goes to zero much stronger then $a_n$ if $a_n$ is trying to diverge, and if $a_n$ goes to zero by himself the exp term is bounded... if still not satisfied I can work out a formal proof — Bob, Nov 03 '19 at 23:10
I finished. It all looks correct! And once again, it is very nicely written. The proof addresses the example I mentioned at the beginning of my answer really well. Just out of curiosity: did that example motivate you at all with this solution? — mathworker21, Nov 03 '19 at 23:15
the initial "use the continuity" approach failed due to possible discontinuity that happens on shells... and then someone with 15.000 reputation started trying to demolish the proof line working on wild discontinuity behaviors on shells... so mathematically and psychologically I thought that there were enough reasons to invest my time to take a closer look on what happens on that damn shells :) — Bob, Nov 03 '19 at 23:57
okay, I just wanted to make sure that I didn't end up being a complete waste of your time. I will award your answer 50 bounty points, if you are fine with that. are you? — mathworker21, Nov 04 '19 at 00:27
It wasn't at all... I appreciated the help a lot... probably if I had been alone to think on this problem I would have let it go after a while: it turned to be much more complicated than I initially thought... About the bounty, thanks! It's a nice answer where to take my first bounty :) — Bob, Nov 04 '19 at 00:43
three final things: (1) the wild discontinuity behavior on shells did demolish the proof line; (2) you are by far smarter/better than most users with over 15k rep; (3) I enjoy your questions so I might try some others. I've been trying to ping you about the setwise convergence question. thanks again for the nice problems and for communicating throughout this whole process :) — mathworker21, Nov 04 '19 at 00:47
sorry to keep spamming. since this question is not mine, I cannot award a bounty of 50; the minimum is 100. can you award my answer 50 and I award yours 100? this of course produces the same net result, but I also directly think it's fair, as my answer was helpful, and I've spent a lot of time on this question. — mathworker21, Nov 04 '19 at 00:53
I think that points exchange is contrary to the policy of this site, so it doesn't matter about the bounty... anyway thx for the thought :) I will think about the other (open subsets setwise) question in the next few days, maybe all this thinking could shed light on that question too... so see you there :) — Bob, Nov 04 '19 at 01:02

About a density property of the Nearest Neighbor algorithm: part 2.

2 Answers2