$\text{Var}[X]$ as an expression of $\text{Var}[\text{sgn}(X)\ln(|X|)]$ for $X\sim \mathcal N(0,\sigma^2)$

Question

Assume $X\sim \mathcal N(0,\sigma^2).$

Assume $\operatorname{Var}[Y]$ is known, where $Y=\ln(X)$ for $X>0,~Y=0$ for $X=0$ and $Y=−\ln(−X)$ for $X<0.$

Is there a way to approximate $\operatorname{Var}[X]$ with a simple formula based on $\operatorname{Var}[Y]?$

Please check below for some threads about the Delta Method:

How can the Delta Method be applied in such special case? If it cannot be altered to accommodate such special case, what else do you recommend? What are the implications?

Btw, if I may ask, what may be the setting where you don't have access to values of $X$, but just the variance of $\mbox{sign}(X)\log(|X|)$? As an aside, note that the variance of $\log(|X|)$ does not depend on $\sigma$, so $\mbox{sign}(X)$ is pretty important in the previous formula — charmd, Sep 22 '22 at 11:42
Thank you for your effort in clarifying. It is a proprietary dataset in which only $sign(X)log(|X|)$ is made available. — anonymous, Sep 23 '22 at 18:16

charmd · Accepted Answer · 2022-09-22T11:53:32.730

2

A concise way of writing $Y$ as a function of $X$ is $Y = \mbox{sign}(X)\log(|X| + 1_{X=0})$. You can pay no attention to $\{X=0\}$ as it has probability $0$ and take the liberty to write $Y = \mbox{sign}(X)\log(|X|)$.

Because $\displaystyle{\int_0^1} \log(x)^2dx < \infty$ and $\displaystyle{\int_1^{+\infty}} \log(x)^2 f_{|X|}(x)dx < \infty$, $Y$ does have a finite variance.

You can write $X = \sigma N$, with $N \sim \mathcal{N}(0,1)$ a standard normal variable and $\sigma > 0$ the standard deviation of $X$. Then almost everywhere (except on $\{X=0\}$), $$Y = \mbox{sign}(N)\log(\sigma |N|) = \log(\sigma)\mbox{sign}(N) +\mbox{sign}(N) \log(|N|)$$ so $\mathbb{E}(Y) = 0$ and $\mbox{Var}(Y) = \mathbb{E}(Y^2) = \log(\sigma)^2 + 2\log(\sigma)\mathbb{E}(\log(|N|))+\mathbb{E}(\log(|N|)^2)$.

You can find an explicit value for $\mbox{Var}(Y)$ as a quadratic polynomial in $\log(\sigma)$ since $2 \mathbb{E}(\log(|N|))=-\gamma -\log(2)\simeq -1.27$, and $\mathbb{E}(\log(|N|)^2)=\frac{2\gamma^2+\pi^2+2\log(2)^2+\gamma\log(16)}{8} \simeq 1.637$, where $\gamma$ is the Euler-Mascheroni constant. The minimum value for the variance is $\frac{\pi^2}{8}$, and is reached at a point we call $\log(\sigma_0) = \frac{\gamma+\log(2)}{2}$.

Inverting the quadratic polynomial (in $\log \sigma$) yields two solutions which are simple to write: $\frac{\gamma+\log(2)}{2} \pm \sqrt{\mbox{Var}(Y) - \frac{\pi^2}{8}}$.

Now you are stuck at this step, because since you want an estimate based only on $\mbox{Var}(Y)$, this equation will have one or two solutions in $\mathbb{R}$ for $\log(\sigma)$, or sometimes zero if you replace the theoretical value $\mbox{Var}(Y)$ with an estimator.

You have no way of deciding which root to select to estimate $\log(\sigma)$. A solution is to choose the largest one if you have reason to believe that $\sigma \ge \sigma_0 = \exp\Big(\frac{\gamma+\log(2)}{2}\Big) \simeq 1.887$, or choose the smallest root otherwise. And if your observed variance is smaller than the lower bound $\frac{\pi^2}{8}$, you can take as your estimate $\hat{\sigma} = \sigma_0$, where the minimum variance $\frac{\pi^2}{8} \simeq 1.234$ is reached.

Summary for the estimator: denoting $\sigma_0 = \exp\Big(\frac{\gamma+\log(2)}{2}\Big)$ and $\hat{V}$ the estimator for $\mbox{Var}(Y)$ we estimate $\hat{\sigma} = \begin{cases} \sigma_0 & \mbox{ if $\hat{V} < \frac{\pi^2}{8}$} \\ \sigma_0\exp\big(\sqrt{\hat{V}-\frac{\pi^2}{8}}\big) & \mbox{ otherwise if you have evidence that $\sigma \ge \sigma_0$} \\ \sigma_0\exp\big(-\sqrt{\hat{V}-\frac{\pi^2}{8}}\big) & \mbox{ otherwise if you have evidence that $\sigma \le \sigma_0$} \end{cases}$

You can get the "evidence" from a Bayesian approach for the estimation of $\sigma$, or combine this estimator with an estimator based on the average of $Y$ or its maximum sampled value.

edited Sep 22 '22 at 11:53

answered Sep 21 '22 at 13:32

charmd

6,050

Interesting. Thank you. Would you mind further explaining $E(ln(|N|))$ and $Var(sign(N)ln(|N|))$? Where do their values come from? – anonymous Sep 21 '22 at 13:52
Do you mean the actual computations, involving the Euler-Mascheroni constant? Tbh, I just plugged it in Wolframalpha, I do not see where $\gamma$ comes from, though it's not shocking to me either. A quick search yields this nice answer: https://math.stackexchange.com/a/1921419/332790 – charmd Sep 21 '22 at 14:00
Very interesting. By "inverting it" do you mean solving the quadratic equation? – anonymous Sep 21 '22 at 14:05
Since I'd get two values for $log(\sigma)$, which should I pick and convert to $\sigma$? – anonymous Sep 21 '22 at 14:14
+1 Presumably if $Y$ is a sample and $Var(Y)<1.6372$ due to random effects of sampling, there will not be a solution to the quadratic – Henry Sep 21 '22 at 14:23
Some of my values for $\sqrt{Var(Y)}$ are: 18.2, 5.1, 13.8, 15.2, 18.6, 7.9, 14.7, 15.6 (my values for $X$ are either very large positive numbers or very large negative numbers or $0$). – anonymous Sep 21 '22 at 15:50
Also, my values for $E(Y)$ oscillate between -1.0 and +0.5. – anonymous Sep 21 '22 at 16:04
Of course my values for mean and variance of $Y$ include the $X=0$ observations (which I do not want to exclude for some reasons related to my problem at hand) for which the estimation method proposed by @charmd does not apply, correct? So? – anonymous Sep 21 '22 at 16:31
Precision: if you work with real data where $0$ comes up "often" among your $X$ values, then I would strongly advise against modelling $X$ as a normal variable, even one centered at $0$. All the more if you are looking at something like $\log(|X|)$ afterwards (with large values for $X$ close to but different than $0$) – charmd Sep 21 '22 at 17:08
Thanks for the edit. Would you mind adding details on $log(\sigma_0)$? What does it stand for? How to find the value $0.31$ and the theoretical lower bound for the variance, i.e. $1.5363$? – anonymous Sep 21 '22 at 17:54
It's simply where the minimum of the quadratic polynomial in $\log(\sigma)$ is reached, and the numerical value at that point. The minimum of $x^2+bx+c$ is at $-\frac{b}{2}$, that's all – charmd Sep 22 '22 at 05:12
Thanks. For completeness, with my values for $Var(Y)$ the two $\sigma$ solutions are very far from each other: one is much larger than the other. – anonymous Sep 22 '22 at 07:36
There was a typo with a factor $4$ instead of $2$. Now my numerical tests work, I will add a last value for the estimator and you can test it as well – charmd Sep 22 '22 at 08:05

$\text{Var}[X]$ as an expression of $\text{Var}[\text{sgn}(X)\ln(|X|)]$ for $X\sim \mathcal N(0,\sigma^2)$

1 Answers1