Liviu's Math Blog: December 2012

Saturday, December 22, 2012

Axiomatic definition of the center of mass

This was prompted by a nice Mathoverflow question. $\newcommand{\bR}{\mathbb{R}}$ $\newcommand{\bZ}{\mathbb{Z}}$ $\newcommand{\bp}{{\boldsymbol{p}}}$ $\newcommand{\Div}{\mathrm{Div}}$ $\newcommand{\supp}{\mathrm{supp}}$ $\newcommand{\bm}{\boldsymbol{m}}$ $\newcommand{\eC}{\mathscr{C}}$ $\newcommand{\bc}{\boldsymbol{c}}$ $\newcommand{\bq}{{\boldsymbol{q}}}$

We define an effective divisor on $\bR^N$ to be a function with finite support $\mu:\bR^N\to\bZ_{\geq 0}$. Its mass, denoted by $\bm(\mu)$, is the nonnegative integer

$$\bm(\mu)=\sum_{\bp\in\bR^N} \mu(\bp). $$

We denote by $\Div_+(\bR^N)$ the set of effective divisors. Note that $\Div_+(\bR^N)$ has a natural structure of Abelian semigroup.

For any $\bp\in\bR^N$ we denote by $\delta_\bp$ the Dirac divisor of mass $1$ and supported at $\bp$. The Dirac divisors generate the semigroup $\Div_+(\bR^N)$. We have a natural topology on $\Div_+(\bR^N)$ where $\mu_n\to \mu$ if and only if

$$ \bm(\mu_n)\to \bm(\mu),\;\; {\rm dist}\,\bigr(\;\supp(\mu_n),\; \supp(\mu)\;\bigr)\to 0, $$

where ${\rm dist}$ denotes the Haudorff distance.

A center of mass is a map

$$\eC:\Div_+(\bR^N)\to\Div_+(\bR^N) $$

satisfying the following conditions.

1. (Localization) For any divisor $\mu$ the support of $\eC(\mu)$ consists of a single point $\bc(\mu)$.

2. (Conservation of mass)

$$\bm(\mu)=\bm\bigl(\;\eC(\mu)\;\bigr),\;\;\forall\mu \in\Div_+(\bR^N),$$

so that

$$\eC(\mu)=\bm(\mu)\delta_{\bc(\mu)},\;\;\forall\mu \in\Div_+(\bR^N).$$

3. (Normalization)

$$\bc(m\delta_\bp)=\bp,\;\;\bc(\delta_\bp+\delta_\bq)=\frac{1}{2}(\bp+\bq),\;\;\forall \bp,\bq\in\bR^N,\;\;m\in\bZ_{>0}. $$

4. (Additivity)

$$\eC(\mu_1+\mu_2)= \eC\bigl(\,\eC(\mu_1)+\eC(\mu_2)\,\bigr),\;\;\forall \mu_1,\mu_2\in \Div_+(\bR^N). $$

For example, the correspondence

$$\Div_+ \ni \mu\mapsto \eC_0(\mu)=\bm(\mu)\delta_{\bc_0(\mu)}\in\Div_+,\;\;\bc_0(\mu):=\frac{1}{\bm(\mu)}\sum_\bp \mu(\bp)\bp $$

is a center-of-mass map. I want to show that this is the only center of mass map.

Proposition If $\eC:\Div_+(\bR^N)\to \Div_+(\bR^N)$ is a center-of-mass map, then $\eC=\eC_0$.

Proof. We carry the proof in several steps.

Step 1 (Rescaling). We can write the additivity property as

$$ \bc(\mu_1+\mu_2) =\bc\bigl(\; \bm(\mu_1)\delta_{\bc(\mu_1)} +\bm(\mu_2)\delta_{\bc(\mu_2)}\;\bigr). $$

In particular, this implies that the rescaling property

$$\bc( k\mu)=\bc(\mu),\;\;\forall\mu \in\Div_+,\;\; k\in\bZ_{>0}. \tag{R}\label{R}$$

This follows by induction $k$. For $k=1$ it is obviously true. In general

$$ \bc( k\mu)=\bc\bigl(\;(k-1)\bm(\mu)\delta_{\bc(\;(k-1)\mu)}+\bm(\mu)\delta_{\bc(\mu)}\;\bigr) =\bc\bigl(\; k\bm(\mu)\delta_{\bc(\mu)}\;\bigr)={\bc(\mu)} $$

Step 2. (Equidistribution) For any $n>0$ and any collinear points $\bp_1,\dotsc,\bp_n$ such that

$$|\bp_1-\bp_2|=\cdots=|\bp_{n-1}-\bp_n| $$

we have

$$\eC\Bigl(\sum_{k=1}^n\delta_{\bp_k}\;\Bigr)=\eC_0\Bigl(\sum_{k=1}^n\delta_{\bp_k}\;\Bigr) \tag{E}\label{E}. $$

Equivalently, this means that

$$\bc\Bigl(\sum_{k=1}^n\delta_{\bp_k}\;\Bigr)=\bc_0\bigl(\sum_{k=1}^n\delta_{\bp_k}\;\bigr)={\frac{1}{n}(\bp_1+\cdots+\bp_n)}. $$

We will prove (\ref{E}) arguing by induction on $n$. For $n=1,2$ this follows from the normalization property. Assume that (\ref{E}) is true for any $n< m$. We want to prove it is true for $n=m$.

We distinguish two cases.

(a) $m$ is even, $m= 2m_0$. We set

$$\mu_1=\sum_{j=1}^{m_0} \delta_{\bp_j},\;\;\mu_2=\sum_{j=m_0+1}^{2m_0}\delta_{\bp_j}. $$

Then

$$\bc(\mu_1+\mu_2)= \bc\bigl(\; m_0\delta_{\bc(\mu1)}+m_0\delta{\bc(\mu_2)}\;\bigr) =\bc\bigl( \delta_{\bc(\mu_1)}+\delta_{\bc(\mu_2)}\;\bigr). \tag{1}\label{2}$$

By induction

$$ \bc(\mu_1)=\bc_0(\mu_1),\;\;\bc(\mu_2)=\bc_0(\mu_2). $$

The normalization condition now implies that

$$ \bc\bigl( \delta_{\bc(\mu_1)}+\delta_{\bc(\mu_2)}\;\bigr)=\bc_0\bigl( \delta_{\bc_0(\mu_1)}+\delta_{\bc_0(\mu_2)}\;\bigr). $$

Now run the arguments in (\ref{2}) in reverse, with $\bc$ replaced by $\bc_0$.

(b) $m$ is odd, $m=2m_0+1$. Define

$$\mu_1=\delta_{\bp_{m_0+1}},\;\;\mu_2'=\sum_{j<m_0+1}\delta_{\bp_j},\;\;\mu_2''=\sum_{j>m_0+1}\delta_{\bp_j},\;\;\mu_2=\mu_2'+\mu_2''. $$

(Observe that $\bp_{m_0+1}$ is the mid-point in the string of equidistant collinear points $\bp_1,\dotsc,\bp_{2m_0+1}$. ) We have

$$\eC(\mu_2'+\mu_2'')=\eC\bigl( \; \eC(\mu_2')+\eC(\mu_2'')\;\bigr). $$

By induction

$$ \eC(\mu_2')+\eC(\mu_2'')= \eC_0(\mu_2')+\eC_0(\mu_2'') =m_0\delta_{\bc_0(\mu_2')}+m_0\delta_{\bc_0(\mu_2'')}=m_0\bigl(\;\delta_{\bc_0(\mu_2')}+\delta_{\bc_0(\mu_2'')}\;\bigr). $$

Observing that

$$\frac{1}{2}\bigl(\bc_0(\mu_2')+\bc_0(\mu_2'')\;\bigr)=\bp_{m_0+1} $$

we deduce

$$\eC(\mu_2)= \eC(\mu_2'+\mu_2'')=m_0\eC\bigl( \delta_{\bc_0(\mu_2')}+\delta_{\bc_0(\mu_2'')}\;\bigr)=m_0\eC_0\bigl( \delta_{\bc_0(\mu_2')}+\delta_{\bc_0(\mu_2'')}\;\bigr)=2m_0\delta_{\bp_{m_0+1}}=2m_0\mu_1. $$

Finally we deduce

$$\eC(\mu)=\eC\bigl(\;\eC(\mu_1)+\eC(\mu_2)\;\bigr)=\eC\bigl(\;\eC(\mu_1)+2m_0\eC(\mu_1)\;\bigr)= (2m_0+1)\delta_{\bp_{m_0+1}}=\eC_0(\mu). $$

Step 3. (Replacement) We will show that for any distinct points $\bq_1,\bq_2$ and any positive integers $m_1,m_2$ we can find $(m_1+m_2)$ equidistant points $\bp_1,\dotsc,\bp_{m_1+m_2}$ on the line determined by $\bq_1$ and $\bq_2$ such that

$$m_1\delta_{\bq_1}=\eC_0\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr)=\eC\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr),\;\;\;m_2\delta_{\bq_2}=\eC_0\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr)=\eC\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr). $$

This is elementary. Without restricting the generality we can assume that $\bq_1$ and $\bq_2$ lie on an axis (or geodesic) $\bR$ of $\bR^N$, $\bq_0=0$ and $\bq_2=q>0$. Clearly we can find real numbers $x_0, r$, $r>0$, such that
$$
\frac{1}{m_1}\sum_{j=1}^{m_1}(x_0+j r)=0,\;\;\frac{1}{m_2}\sum_{j=m_1+1}^{m_1+m_2}(x_0+jr)=q. $$

Indeed, the above two equalities can be rewritten as

$$ x_0+\frac{m_1+1}{2} r=0, $$

$$ q=x_0 +m_1 r+\frac{m_2+1}{2}=x_0+\frac{m_1+1}{2} r+\frac{m_1+m_2}{2} r. $$

Now place the points $\bp_j$ at the locations $x_0+jr$.

Step 4. (Conclusion) We argue on by induction on mass that

$$\eC(\mu)=\eC_0(\mu),\;\;\forall \mu\in \Div_+\tag{2}\label{3}$$

Clearly, the normalization condition shows that (\ref{3}) is true if $\supp\mu$ consists of a single point, or if $\bm(\mu)\leq 2$.

In general if $\bm(\mu)>2$ we write $\mu=\mu_1+\mu_2$ where $m_1=\bm(\mu_1),m_2=\bm(\mu_2)<\bm(\mu)$.

By induction we have

$$ \eC(\mu)= \eC\bigl( \eC(\mu_1)+\eC(\mu_2)\bigr)=\eC(\;\eC_0(\mu_1)+\eC_0(\mu_2)\;\bigr). $$

If $\bc_0(\mu_1)=\bc_0(\mu_2)$ the divisors $\eC_0(\mu_1)$ $\eC_0(\mu_2)$ are supported at the same point and we are done. Suppose that $\bq_1=\bc_0(\mu_1)\neq\bc_0(\mu_2)=\bq_2$. By Step 3, we can find equidistant points $\bp_1,\dotsc,\bp_{m_1+m_2}$ such that

$$ m_1\delta_{\bq_1}=\eC\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr)= \eC_0\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr) $$
$$ m_2\delta_{\bq_2}=\eC\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr)=\eC_0\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr). $$

We deduce that

$$\eC(\mu)=\eC\Bigl(\sum_{k=1}^{m+1+m_2}\delta_{\bp_k}\Bigr),\;\; \eC_0(\mu)=\eC_0\Bigl(\sum_{k=1}^{m+1+m_2}\delta_{\bp_k}\Bigr). $$

The conclusion now follows from (\ref{E}). q.e.d

Remark. The above proof does not really use the linear structure. If we uses only the fact that any two points in $\bR^N$ determine a unique geodesic. The Normalization condition can be replaced by the equivalent one

$$ \bc(\delta_\bp+\delta_\bq)= \mbox{the midpoint of the geodesic segment $[\bp,\bq]$}. $$

If we replace $\bR^N$ with a hyperbolic space the same arguments show that there exists at most one center of mass map.

Tuesday, December 18, 2012

The 11/8-conjecture was resuscitated back to life. Yep, there was a flaw

Apparently Tom Mrowka found a flaw in Bauer's "proof" of the 11/8-th conjecture.

Monday, December 17, 2012

A really nifty linear algebra trick

I've been stuck on this statistics problem for quite a while, I could taste where it was going but I could never put into words my intuition. Today I discovered how to neatly bypass the obstacle. Discovered is not the appropriate word, because someone else figured it out long before me. It's a really, really elementary linear algebra trick that I have never encountered in my travels. Very likely, more experienced statisticians than myself would smile at my ignorance.

The earliest occurrence of this trick I could trace is in a Russian paper by R. N. Belyaev published in Teoryia Veroyasnosti i eio Primenenyia, 1966. $\newcommand{\bR}{\mathbb{R}}$ $\newcommand{\bx}{\boldsymbol{x}}$ $\newcommand{\by}{\boldsymbol{y}}$

Suppose that $S$ is an invertible $n\times n$ matrix and $\bx,\by\in\bR^n$ are vectors which I regard as column vectors, i.e., column matrices. Denote by $(-, -)$ the natural inner product in $\bR^n$

$$(\bx,\by)= \bx^\dagger\cdot \by, $$

where ${}^\dagger$ denotes the transpose of a matrix.

Let $r\in\bR$. The name of the game is to compute the scalar

$$ r- (\bx, S^{-1} \by)= r-\bx^\dagger\cdot S\cdot \by. $$

Such computations are often required in the neck of the woods where I've been spend the best part of the last three years namely, geometric probability. So here is the trick. I'll name it after Belyaev because I am sure he was not the first to observe it. (He even refers to an old book by H. Cramer on statistics.) $\newcommand{\one}{\boldsymbol{1}}$

Belyaev's Trick.

$$ r-\bx^\dagger\cdot S\cdot \by =\frac{\det\left[\begin{array}{cc} S &\by\\ \bx^\dagger & r
\end{array}\right]}{\det S}=\frac{\det\left[\begin{array}{cc} r &\bx^\dagger\\ \by & S
\end{array}\right]}{\det S}. $$

Here is the disappointingly simple proof. Note that

$$ \left[
\begin{array}{cc}\one_n & S^{-1}\by\\
0 & r-\bx^\dagger S^{-1} \by
\end{array}
\right]= \left[
\begin{array}{cc} \one_n & 0\\-\bx^\dagger & 1
\end{array}
\right]\cdot \left[
\begin{array}{cc} S^{-1} & 0\\
0 & 1 \end{array}\right] \cdot  \left[ \begin{array}{cc} S &\by\\ \bx^\dagger & r
\end{array}\right]. $$

Now take the determinants of both sides to obtain the first equality. The second equality follows from the first by permuting the rows and columns of the matrix at numerator. $\DeclareMathOperator{\Cov}{\boldsymbol{Cov}}$ $\newcommand{\bsE}{\boldsymbol{E}}$

Here is how it works in practice. Suppose that $(X_0, X_1, \dotsc, X_n)\in \bR^{n+1}$ is a centered random Gaussian, with covariance matrix

$$\Cov(X_0, X_1, \dotsc, X_n)= \Bigl( \;\bsE\bigl( X_i\cdot X_j\,\bigr)\;\Bigr)_{0\leq i,j\leq n}. $$

Assume that the Gaussian vector $(X_1,\dotsc, X_n)$ is nondegenerate, i.e., the symmetric matrix $S=\Cov(X_1,\dotsc, X_n)$ is invertible.

We can then define in an unambiguous way the conditional random variable $\DeclareMathOperator{\var}{\boldsymbol{var}}$

$$(X_0|\; X_1=\cdots =X_n=0). $$

This is a centered Gaussian random variable with variance given by the the regression formula

$$ \var(X_0|\; X_1=\cdots =X_n=0)= \var(X_0) - \bx^\dagger\cdot S\cdot \bx, $$

where $\bx^\dagger $ is the row vector

$$\bx^\dagger =\left(\; \bsE(X_0X_1),\cdots ,\bsE(X_0 X_n)\;\right). $$

If we now use Belyaev's trick we deduce

$$  \var(X_0|\; X_1=\cdots =X_n=0)=\frac{\det\Cov(X_0, X_1, \dotsc, X_n)}{\det\Cov(X_1,\dotsc, X_n)}. $$

In this form it is used in the related paper of Jack Cuzick (Annals of Probability, 3(1975), 849-858.)

On a "Car-Talk" problem

While driving back home I heard an interesting math question from all places, the Car Talk show on NPR. This made me think of a generalization of the trick they used and in particular, formulate the following problem. $\newcommand{\bR}{\mathbb{R}}$

Problem Determine all, reasonably well behaved compact domains $D\subset \bR^2$ with the following property: any line through the origin divides $D$ into two regions of equal areas. We will refer to this as property $\boldsymbol{C}$ (for cut).

I know that "reasonably well behaved" is a rather fuzzy requirement. At this moment I don't want to think of Cantor like weirdos. So let's assume that $D$ is semialgebraic.

We say that a domain $D$ satisfies property $\boldsymbol{S}$ (for symmetry) if it is invariant with respect to the involution

$$\bR^2\ni (x,y)\mapsto (-x,-y) \in \bR^2. $$

It is not hard to see that

$$\boldsymbol{S}\Rightarrow \boldsymbol{C}. $$

Is the converse true?

I describe below one situation when this happens.

A special case. I'll assume that $D$ is semialgebraic, star-shaped with respect to the origin and satisfies $\boldsymbol{C}$. We can then describe $D$ in polar coordinates by an inequality of the form

$$(r,\theta)\in D \Longleftrightarrow 0\leq f(\theta),\;\;\theta\in[0,2\pi], $$

where $f:[0,2\pi]\to (0,\infty)$ is a semialgebraic function such that $R(0)=R(2\pi)$. We can extend $f$ by $2\pi$-periodicity to a function $f:\bR\to [0,\infty)$ whose restriction to any finite interval is semialgebraic.

For any $\phi\in[0,2\pi]$ denote by $\ell_\phi:\bR^2\to \bR$ the linear functiondefined by

$$\ell_\phi (x,y)= x\cos\phi +y\sin\phi. $$

Denote by $A(\phi)$ the area of the region $D\cap \bigl\{ \ell\phi\geq 0\bigr\}$. Since $D$ satisfies $\boldsymbol{C}$ we deduce

$$A(\phi)=\frac{1}{2} {\rm area}\;(D), $$

so that

$$A'(\phi)=0,\;\;\forall \phi. $$

Observe that

$$ A(\phi+\Delta \phi)-A(\phi) =\int_{\phi+\frac{\pi}{2}}^{\phi+\frac{\pi}{2}+\Delta\phi} \left(\int_0^{f(t)} r dr\right) dt -\int_{\phi+\frac{3\pi}{2}}^{\phi+\frac{3\pi}{2}+\Delta\phi} \left(\int_0^{f(t)} r dr\right) dt. $$

For simplicity we set $\theta=\theta(\phi)=\theta+\frac{\pi}{2}$. We can then rewrite the above equality as

$$ A(\phi+\Delta \phi)-A(\phi)=\int_\theta^{\theta+\Delta\theta}\left(\int_0^{f(t)} r dr\right) dt -\int_{\theta+\pi}^{\theta+\pi+\Delta\theta} \left(\int_0^{f(t)} r dr\right) dt. $$

Hence

$$0=A'(\phi)= \frac{}{2}\Bigl( f(\theta)^2-f(\theta+\pi)^2\Bigr). $$

Hence $f(\theta)= f(\theta+\pi)$, $\forall \theta$. This shows that $D$ satisfies the symmetry condition $\boldsymbol{S}$.

$$ \ast\ast\ast $$

Here is a simple instance when $\boldsymbol{C}$ does not imply $\boldsymbol{S}$.

Suppose that $D$ is semialgebraic and has the annular description

$$ f(\theta)\leq r\leq F(\theta). \tag{1}\label{1}$$.

Using the same notations as above we deduce that

$$ 0=A'(\phi)= \frac{1}{2} \Bigl(F^2(\theta)-f^2(\theta)\Bigr)- \frac{1}{2} \Bigl(F^2(\theta+\pi)-f^2(\theta+\pi)\Bigr). $$

Thus, the domain (\ref{1}) satisfies $\boldsymbol{C}$ iff the function $ G(\theta)=F^2(\theta)-f^2(\theta)$ is $\pi$-periodic. Note that

$$F(\theta)= \sqrt{f^2(\theta)+ G(\theta)}.$$

If we choose

$$f(\theta)= e^{\sin \theta},\;\; G(\theta)=e^{\cos 2\theta},$$

then we obtain the domain bounded by the two closed curves in the figure below. This domain obviously violates the symmetry condition $\boldsymbol{S}$.

Thursday, December 13, 2012

Video Lectures in Mathematics (mathematicsprof) on Pinterest

Wednesday, December 12, 2012

12.12.12-Once in a century

I had to do this. It the last time this century one can do this, and I could not pass this opportunity to immortalize it.

Geometry conference in the memory of Jianguo Cao

It's been a bit over a year and a half now since my dear friend Jianguo Cao unexpectedly passed away. I miss him for many reasons. He was my gentle, wise and always wellcoming Riemann geometry guru. Our department is organizing a conference in his memory (March 13-17, 2013). The least I could do is to spread the word. Unfortunately, I cannot attend. In any case here is a picture of Jianguo from 2004. He is the leftmost person in the row, I am the only bearded guy.

On conditional expectations.

$\newcommand{\bR}{\mathbb{R}}$ I am still struggling with the idea of conditioning. Maybe this public confession will help clear things out. $\newcommand{\bsP}{\boldsymbol{P}}$ $\newcommand{\eA}{\mathscr{A}}$ $\newcommand{\si}{\sigma}$

Suppose that $(\Omega, \eA, \bsP)$ is a probability space, where $\eA$ is a $\si$-algebra of subsets of $\Omega$ and $\bsP:\eA\to [0,1]$ is a probability measure. We assume that $\eA$ is complete with respect to $\bsP$, i.e., subsets of $\bsP$-negligible subsets are measurable. $\newcommand{\bsU}{{\boldsymbol{U}}}$ $\newcommand{\bsV}{{\boldsymbol{V}}}$ $\newcommand{\eB}{\mathscr{B}}$

Assume that $\bsU$ and $\bsV$ are two finite dimensional real vector spaces equipped with the $\si$-algebras of Borel subsets, $\eB_{\bsU}$ and respectively $\eB_{\bsV}$. Consider two random variables $X:\Omega\to \bsU$ and $Y:\Omega\to \bsV$ with probability measures

$$p_X=X_*\bsP,\;\;p_Y=Y_*\bsP. $$

Denote by $p_{X,Y}$ the joint probability measure $\newcommand{\bsE}{\boldsymbol{E}}$

$$ p_{X,Y}=(X\oplus Y)_*\bsP. $$

The expectation $\bsE(X|Y)$ is a new $\bsV$-valued random variable $\omega\mapsto \bsE(X|Y)_\omega$, but on a different probability space $(\Omega, \eA_Y, \bsP_Y)$ where $\eA_Y=Y^{-1}(\eB_\bsV)$, and $\bsP_Y$ is the restriction of $\bsP$ to $\eA_Y$. The events in $\eA_Y$ all have the form $\{Y\in B\}$, $B\in\eB_{\bsU}$.

This $\eA_Y$-measurable random variable is defined uniquely by the equality

$$ \int_{Y\in B} E(X|Y)_\omega \bsP_Y(d\omega) = \int_{Y\in B}X(\omega) \bsP(d\omega),\;\;\forall B\in\eB_\bsV. $$

Warning: The truly subtle thing in the above equality is the integral in the left-hand-side which is performed with respect to the restricted measure $\bsP_Y$.

If we denote by $I_B$ the indicator function of $B\in\eB_\bsV$, then we can rewrite the above equality as

$$\int_\Omega \bsE(X|Y)_\omega I_B(Y(\omega)) \bsP_Y(d\omega)=\int_\Omega X(\omega) I_B(Y(\omega) )\bsP(d\omega). $$

In particular we deduce that for any step function $f: \bsV \to \bR$ we have

$$\int_\Omega \bsE(X|Y)_\omega f(Y(\omega)) \bsP_Y(d\omega) =\int_\Omega X(\omega) f(Y(\omega) )\bsP(d\omega). $$

The random variable $\bsE(X|Y)$ defines a $\bsU$-valued random variable $\bsV\ni y\mapsto \bsE(X|y)\in\bsU$ on the probability space $(\bsV,\eB_\bsV, p_Y)$ where

$$ \int_B \bsE(X| y) p_Y(dy)=\int_{(x,y)\in B\times\bsV} x p_{X,Y}(dxdy). $$

Example 1. Suppose that $A, B\subset \Omega$, $X=I_A$, $Y=I_B$. Then $\eA_Y$ is the $\si$-algebra generated by $B$. The random variable $\bsE(I_A|I_B)$ has a constant value $x_B$ on $B$ and a constant value $x_{\neg B}$ on $\neg B :=\Omega\setminus B$. They are determined by the equality

$$ x_B \bsP(B)= \int_B I_A(\omega)\bsP(d\omega) =\bsP(A\cap B) $$

so that

$$ x_B=\frac{\bsP(A\cap B)}{\bsP(B)}=\bsP(A|B). $$

Similarly

$$ x_{\neg B}= \bsP(A|\neg B). $$

$$ \ast\ast\ast $$

Example 2. Suppose $\bsU=\bsV=\bR$ and that $X$ and $Y$ are discrete random variables with ranges $R_X$ and $R_Y$. The random variable $\bsE(X|Y)$ has a constant value $\bsE(X|y)$ on the set $\{Y=y\}$, $y\in R_Y$. It is determined from the equality

$$ \bsE(X|Y=y)p_Y(y)=\int_{Y=y} \bsE(X|Y)_\omega \bsP_Y(\omega) =\int_{Y=y} X(\omega) d\bsP(\omega). $$

Then $\bsE(X|Y)$ can be viewed as a random variable $ (R_Y, p_Y)\to \bR$, $y\mapsto \bsE(X|Y)_y=\bsE(X|Y=y)$, where

$$ \bsE(X|Y=y) =\frac{1}{p_Y(y)}\int_{Y=y} X(\omega) d\bsP(\omega). $$

For this reason one should think of $\bsE(X|Y)$ as a function of $Y$. From this point of view, a more appropriate notation would be $\bsE_X(Y)$.

The joint probability distribution $p_{X,Y}$ can be viewed as a function

$$ p_{X,Y}: R_X\times R_Y\to \bR_{\geq 0},\;\;\sum_{(x,y)\in R_X\times R_Y} p_{X,Y}(x,y)= 1. $$

Then

$$ \bsE(X|Y=y)= \sum_{x\in R_X} x\frac{P_{X,Y}(x,y)}{p_Y(y)}. $$

We introduce new $R_X$-valued random variable $(X|Y=y)$ with probability distribution $p_{X|Y=y}(x)=\frac{p_{X,Y}(x,y)}{p_Y(y)}$.

Then $\bsE(X|Y=y)$ is the expectation of the random variable $(X|Y=y)$.

$$ \ast\ast\ast $$

Example 3. Suppose that $\bsU, \bsV$ are equipped with Euclidean metrics, $X,Y$ are centered Gaussian random vectors with covariance forms $A$ and respectively $B$. Assume that the covariance pairing between $X$ and $Y$ is $C$ so that the covariance form of $(X, Y)$ is

$$ S=\left[
\begin{array}{cc} A & C\\
C^\dagger & B
\end{array}
\right]. $$

We have $\newcommand{\bsW}{\boldsymbol{W}}$

$$P_{X,Y}(dw) =\underbrace{\frac{1}{\sqrt{\det 2\pi S}} e^{-\frac{1}{2}(S^{-1}w,w)} }_{=:\gamma_S(w)}dw,\;\;w=x+y\in \bsW:=\bsU\oplus \bsV $$

$$ P_Y(dy) = \gamma_B(y) dy),\;\;\gamma_B(Y)=\frac{1}{\sqrt{\det 2\pi B}} e^{-\frac{1}{2}(B^{-1}w,w)} . $$

For any bounded measurable function $f:\bsU\to \bR$ we have

$$\int_{\Omega} f(X(\omega)) \bsP(d\omega)=\int_\bsV \bsE(f(X)|Y) \bsP_Y(d\omega)=\int_\bsV \bsE(f(X)| Y=y) dp_Y(y). $$

We deduce

$$\int_{\bsU\oplus \bsV} f(x) \gamma_S(x,y) dx dy= \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy. $$

Now observe that

$$ \int_{\bsV}\left(\int_\bsU f(x) \gamma_S(x,y) dx\right) dy = \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy. $$

This implies that

$$ \bsE(f(X)| Y=y) =\frac{1}{\gamma_B(y)} \left(\int_\bsU f(x) \gamma_S(x,y) dx\right). $$

We obtain a probability measure $p_{X|Y=y}$ on the affine plane $\bsU\times \{y\}$ given by

$$p_{X|Y=y}(dx)= \frac{\gamma_S(x,y)}{\gamma_B(y)} dx. $$

This is a Gaussian measure on $\bsU$. Its statistics are described by the regression formula. More precisely, its mean is

$$m_{X|Y=y}= Cy, $$

and its covariance form is

$$S_{X|Y=y}= A- CB^{-1}C^\dagger. $$

$$ \ast\ast\ast $$

In general, if we think of $p_{X,Y}$ as a density on $\bsU\oplus \bsV$, of $p_Y$ as a density on $\bsV$ and we denote by $\pi_\bsV$ the natural projection $\bsU\oplus\bsV\to\bsV$, then the conditional probability distribution $p_{X|Y=y}$ is a a probability density on $\pi^{-1}_\bsV(y)$. More precisely it is the density $p_{X,Y}/\pi^*_Vp_Y$ defined as in Section 9.1.1 of my lectures, especially Proposition 9.1.8 page 350 of the lectures.

Monday, December 3, 2012

Degeneration of Gaussian measures

$\newcommand{\bR}{\mathbb{R}}$ $\newcommand{\ve}{{\varepsilon}}$ $\newcommand{\bsV}{\boldsymbol{V}}$ Suppose that $\bsV$ is an $N$-dimensional real Euclidean space equipped with an orthogonal direct sum $\newcommand{\bsU}{\boldsymbol{U}}$ $\newcommand{\bsW}{\boldsymbol{W}}$

$$\bsV =\bsU\oplus \bsW. \tag{1}\label{1}$$

Suppose that $ S_n: \bsU\to\bsU $ and $C_n:\bsW\to \bsW$ are symmetric positive definite operators such that

$$S_n\to 0,\;\;C_n\to C,\;\;\mbox{as}\;\;n\to \infty $$

where $C$ is a symmetric positive definite operator on $\bsW$. We set

$$A_n=S_n\oplus C_n :\bsV\to \bsV $$

and we think of $A_n$ as the covariance matrix of a Gaussian measure on $\bsV$ $\newcommand{\bv}{\boldsymbol{v}}$

$$\gamma_{A_n}(|d\bv|)=\frac{1}{\sqrt{\det 2\pi A_n}} e^{-\frac{1}{2}(A_n^{-1} \bv,\bv)} |d\bv|. $$

Suppose that $f:\bsV\to \bR$ is a locally Lipschitz function, positively homogeneous of degree $k\geq 1$.

I am interested in the behavior as $n\to \infty$ of the expectation

$$ E_n(f):=\int_{\bsV} f(\bv)\gamma_{A_n}(|d\bv|). $$

$\newcommand{\bu}{\boldsymbol{u}}$ $\newcommand{\bw}{\boldsymbol{w}}$ We respect to the decomposition (\ref{1}) a vector $\bv\in \bv_0$ can be written as an orthogonal sum $\bv=\bu+\bw$.

Define

$$\bar{f}_n:\bsW\to [0,\infty),\;\; \bar{f}_n(\bw)= \int_{\bsU} f(\bu+\bw) \gamma_{S_n}(|d\bu|), $$

where $d\gamma_{S_n}$ denotes the Gaussian measure on $\bsU$ with covariance form $S_n$. Then

$$ E_n(f)=\int_{\bsW} \bar{f}_n(\bw) \gamma_{C_n}(|d\bw|). \tag{2}\label{2} $$

For $\bw\in \bsW$ and $r\in (0,1]$ we set

$$m(\bw, r) := \sup_{|\bu|\leq r}|f(\bw+u)- f(u)|. $$

Note that

$$ \exists L>0: m(\bw,r)\leq Lr,\;\;\forall |\bw|= 1\tag{3}\label{3} $$

In general, we set $\bar{\bw}:=\frac{1}{|\bw|} \bw$. If $|\bu|\leq r$ and we have
$$
\bigl|\;f(\bw+\bu)-g(\bw) \;\bigr|= |\bw|^k \left| f\Bigl(\bar{\bw}+\frac{1}{|\bw|} \bu\Bigr) -f(\bar{\bw})\right| \leq L |\bw|^{k-1} r,
$$
so that
$$
m(w,r) \leq L|\bw|^{k-1} r,\;\;\forall \bw\in\bsW,\;\;r\in (0,1]. \tag{4}\label{4}
$$

To proceed further, we need a vector counterpart for the Chebysev inequality.

Lemma 1. Suppose $S:\bsU\to \bsU$ is a symmetric, positive definite operator. We set $R:=S^{-\frac{1}{2}}$ and denote by $\gamma_{S}$ the associated Gaussian measure. Then for any $c,\ell>0$ we have $\newcommand{\bsi}{\boldsymbol{\sigma}}$

$$ \int_{ |R \bu|\geq c} |\bu|^\ell d\gamma_S(|\bu|) \leq \sqrt{2^{\ell+m-\frac{3}{2}} \Gamma\Bigl(\; \ell+m-\frac{1}{2}\;\Bigr)} \frac{\bsi_{m-1}}{(2\pi)^{\frac{m}{2}}} \Vert S\Vert^{\frac{\ell}{2}}c^{-\frac{1}{2}}e^{-\frac{c^2}{4}} , \tag{5}\label{5}$$

where $m=\dim\bsU$ and and $\bsi_N$ denote the area of the $N$-dimensional unit sphere.

Proof. We make the change in variables $\newcommand{\bx}{\boldsymbol{x}}$ $\bx:=R\bu$ and we deduce
$$
\int_{ |R \bu|\geq c} |\bu|^\ell d\gamma_S(|\bu|)\leq \frac{1}{(2\pi)^{\frac{m}{2}}} \int_{|\bx|\geq c} |S^{\frac{1}{2}} \bx|^\ell e^{-\frac{1}{2}|\bx|^2} |d\bx| $$

$$
\leq \frac{\Vert|S\Vert^{\frac{\ell}{2}}}{(2\pi)^{\frac{m}{2}}} \int_{|\bx|\geq c} |\bx|^\ell e^{-\frac{1}{2}|\bx|^2} |d\bx|=\frac{\bsi_{m-1}\Vert S\Vert^{\frac{\ell}{2}}}{(2\pi)^{\frac{m}{2}}}\int_{t>c} t^{\ell+m-1} e^{-\frac{1}{2} t^2} dt
$$
$$
\leq \frac{\bsi_{m-1}\Vert S\Vert^{\frac{\ell}{2}}}{(2\pi)^{\frac{m}{2}}} \left(\int_{t>c} e^{-\frac{1}{2} t^2} dt\right)^{\frac{1}{2}}\left(\int_{t>0} t^{2\ell+2m-2} e^{-\frac{1}{2} t^2} dt\right)^{\frac{1}{2}}
$$
Now observe that we have
$$
\int_{t>c} e^{-\frac{1}{2} t^2} dt \leq \frac{1}{c} e^{-\frac{c^2}{2}},
$$
and using the change of variables $s=\frac{t^2}{2}$ we deduce

$$ \int_{t>0} t^{2\ell+2m-2} e^{-\frac{1}{2} t^2} dt =2^{\ell+m-\frac{3}{2}}\int_0^\infty s^{\ell+m-\frac{1}{2}-1} e^{-s} ds= 2^{\ell+m-\frac{3}{2}} \Gamma( \ell+m-\frac{1}{2}). $$

This proves the lemma. q.e.d

We now want to compare $\bar{f}_n(\bw)$ and $f(\bw)$ for $\bw\in\bsW$. We plan to use Lemma 1. Set $R_n:=S_n^{-\frac{1}{2}}$ and $m:=\dim\bsU$. Observe that

$$|\bu\|= |S_n^{\frac{1}{2}}R_n\bu|\leq \Vert S_n^{\frac{1}{2}}\Vert\cdot |R_n\bu|. $$

For simplicity set $s_n:= \Vert S_n^{\frac{1}{2}}\Vert$. Choose a sequence of positive numbers $c_n$ such that $c_n\to\infty$ and $ s_n c_n\to 0$. Later we will add several requirements to this sequence.

$$
\bigl|\;\bar{f}_n(\bw)-f(\bw)\;\bigr|=\left| \int_{\bsU} (\; f(\bw+\bu)- f(\bw)\; ) \gamma_{S_n}(|d\bu|)\right| $$
$$
\leq \left| \int_{|R_n\bu|\leq c_n} (\; f(\bw+\bu)- f(\bw)\; ) \gamma_{S_n}(|d\bu|)\right|+\left|\int_{|R_n\bu|\geq c_n} (\; f(\bw+\bu)- f(\bw)\; ) \gamma_{S_n}(|d\bu|)\right| $$
$$\stackrel{(\ref{4})}{\leq} L|\bw|^{k-1}s_n c_n +C \int_{|R_n\bu|\geq c_n}(|\bw|^k+|\bu|^k) \gamma_{S_n}(|d\bu|) $$

$$
\stackrel{(\ref{5})}{\leq} L|\bw|^{k-1}s_n c_n + Z(k, m)c_n^{-\frac{1}{2}} e^{-\frac{c_n^2}{4}}(1+s_n^k),
$$
where $Z(k,m)$ is a constant that depends only on $k$ and $m$.

We deduce that there exists a constant $C>0$ independent of $n,w$ such that for any sequence $c_n\to \infty$ such that $s_nc_n\to 0$, $s_n:=\Vert S_n\Vert^{\frac{1}{2}}$ we have

$$
\bigl|\;\bar{f}_n(\bw)-f(\bw)\;\bigr| \leq C\bigl(\; |\bw|^{k-1}s_nc_n + e^{-\frac{c_n^2}{4}}\;\bigr). \tag{6}\label{6}

$$
We deduce that

$$

\Bigl|\; E_n(f) -\int_{\bsW} f(\bw) \gamma_{C_n}(|d\bw|)\;\Bigr| \leq C\left(s_nc_n\int_{\bsW} |\bw|^{k-1} \gamma_{C_n}(|d\bw|) + e^{-\frac{c_n^2}{4}}\;\right).\tag{7}\label{7}
$$

Finally let us estimate

$$D_n:=\int_{\bsW} f(\bw) \gamma_{C_n}(|d\bw|)-\int_{\bsW} f(\bw) d\gamma_{C}(|d\bw|). $$

We have $\newcommand{\one}{\boldsymbol{1}}$
$$D_n= \int_{\bsW} \left( f\bigl( C_n^{\frac{1}{2}}\bw\;\bigr)-f\bigl( C^{\frac{1}{2}}\bw\;\bigr) \;\right)\gamma_{\one}(|\bw|)
$$
and we conclude that
$$
\left|\; \int_{\bsW} f(\bw) d\gamma_{C_n}(|d\bw|)-\int_{\bsW} f(\bw) d\gamma_{C}(|d\bw|)\;\right| \leq L \Bigl\Vert \;C_n^{\frac{1}{2}}-C^\frac{1}{2}\;\Bigr\Vert \int_{\bsW}|\bw|^k \gamma_{\one}(|d\bw|). \tag{8}\label{8}
$$

In (\ref{7}) we let $c_n:=s_n^{-\ve}$. If we denote by $A_\infty$ the limit of the covariance matrices $A_n$, $A=\lim_{n\to\infty} A_n =0\oplus C$, then we deduce from the above computations that for any $\ve>0$ there exists a constant $C_\ve>0$ such that
$$
\left|\; \int_{\bsV} f(\bv) \gamma_{A_n} (|d\bw|) -\int_{\bsV} f(\bv) \gamma_{A_\infty} (|d\bw|) \;\right|\leq C_\ve \left(s_n^{1-\ve}+ \Bigl\Vert \;C_n^{\frac{1}{2}}-C^\frac{1}{2}\;\Bigr\Vert\right)\leq C_\ve \Bigl\Vert A_n^{\frac{1}{2}}-A_\infty^{\frac{1}{2}}\Bigr\Vert^{1-\ve}.\tag{9}\label{9}
$$
This can be generalized a bit. Suppose that $T_n:\bsU\to \bsU$ is a sequence of orthogonal operators such that $T_n\to \one_{\bsU}$

Using (\ref{7}) we deduce

$$ \left|\;\int_{\bsV} T^*_nf(\bv) \gamma_{A_n}(|d\bv|)-\int_{\bsV} f(\bv) \gamma_{A_n}(|d\bv|)\right|= \left| \int_{\bsV} f(T_n A_n^{\frac{1}{2}}\bx)- f( A_n^{\frac{1}{2}}\bx)\gamma_{\one}(|d\bx|) \right| \leq L \Bigl\Vert A_n^{\frac{1}{2}}\Bigr\Vert \Vert T_n-\one\Vert.
$$
Observe that
$$
\int_{\bsV} T^*_nf(\bv) \gamma_{A_n}(|d\bv|)=\int_{\bsV} f(\bv) \gamma_{B_n}(|d\bv|),
$$
where
$$
B_n= T_nA_nT_n^*.
$$

Suppose that we are in the fortunate case when $f|_{\bsW}=0$. Then

$$\int_{\bsW} f(\bw) d\gamma_{C_n}(|d\bw|)=\int_{\bsW} f(\bw) d\gamma_{C}(|d\bw|)=0 $$

and (\ref{9}) can be improved to

$$
\left|\; \int_{\bsV} f(\bv) \gamma_{A_n} (|d\bw|)\right|\leq C_\ve s_n^{1-\ve}.
$$

On the 11/8-conjecture

Stefan Bauer has just posted a proof for the 11/8- conjecture for simply connected $4$-manifolds