Loading [MathJax]/extensions/TeX/newcommand.js
Powered By Blogger

Saturday, December 22, 2012

Axiomatic definition of the center of mass

This  was prompted by a nice Mathoverflow question.    \newcommand{\bR}{\mathbb{R}}  \newcommand{\bZ}{\mathbb{Z}} \newcommand{\bp}{{\boldsymbol{p}}} \newcommand{\Div}{\mathrm{Div}} \newcommand{\supp}{\mathrm{supp}} \newcommand{\bm}{\boldsymbol{m}} \newcommand{\eC}{\mathscr{C}} \newcommand{\bc}{\boldsymbol{c}} \newcommand{\bq}{{\boldsymbol{q}}}

We define an effective divisor   on \bR^N to be a   function with finite support \mu:\bR^N\to\bZ_{\geq 0}. Its mass, denoted by \bm(\mu), is the  nonnegative integer

\bm(\mu)=\sum_{\bp\in\bR^N} \mu(\bp).

We denote by \Div_+(\bR^N) the set of effective divisors.  Note that \Div_+(\bR^N) has a natural structure of Abelian semigroup.

For any \bp\in\bR^N we denote by \delta_\bp the Dirac divisor of mass 1 and supported at  \bp.   The  Dirac divisors generate  the  semigroup \Div_+(\bR^N).     We have a natural  topology on  \Div_+(\bR^N) where \mu_n\to \mu if and only if

\bm(\mu_n)\to \bm(\mu),\;\; {\rm dist}\,\bigr(\;\supp(\mu_n),\; \supp(\mu)\;\bigr)\to 0,

where {\rm dist} denotes the Haudorff distance.

A center of mass   is a map

\eC:\Div_+(\bR^N)\to\Div_+(\bR^N)

satisfying the following conditions.

1. (Localization) For any divisor \mu the support of \eC(\mu) consists of  a single point \bc(\mu).

2.  (Conservation of mass)

\bm(\mu)=\bm\bigl(\;\eC(\mu)\;\bigr),\;\;\forall\mu \in\Div_+(\bR^N),


so that

\eC(\mu)=\bm(\mu)\delta_{\bc(\mu)},\;\;\forall\mu \in\Div_+(\bR^N).


3. (Normalization)

\bc(m\delta_\bp)=\bp,\;\;\bc(\delta_\bp+\delta_\bq)=\frac{1}{2}(\bp+\bq),\;\;\forall \bp,\bq\in\bR^N,\;\;m\in\bZ_{>0}.

4. (Additivity)

\eC(\mu_1+\mu_2)= \eC\bigl(\,\eC(\mu_1)+\eC(\mu_2)\,\bigr),\;\;\forall \mu_1,\mu_2\in \Div_+(\bR^N).  



For example, the   correspondence

\Div_+ \ni \mu\mapsto  \eC_0(\mu)=\bm(\mu)\delta_{\bc_0(\mu)}\in\Div_+,\;\;\bc_0(\mu):=\frac{1}{\bm(\mu)}\sum_\bp \mu(\bp)\bp

is a center-of-mass  map.  I want to show that this is the only center of mass map.

Proposition  If \eC:\Div_+(\bR^N)\to \Div_+(\bR^N) is a  center-of-mass map, then \eC=\eC_0.

Proof.      We carry the proof in several steps.


Step 1 (Rescaling).   We can write the additivity property as

\bc(\mu_1+\mu_2) =\bc\bigl(\; \bm(\mu_1)\delta_{\bc(\mu_1)} +\bm(\mu_2)\delta_{\bc(\mu_2)}\;\bigr).

In particular, this implies that the rescaling property

\bc( k\mu)=\bc(\mu),\;\;\forall\mu \in\Div_+,\;\; k\in\bZ_{>0}. \tag{R}\label{R}



This follows by induction k. For k=1 it is obviously true.  In general

 \bc( k\mu)=\bc\bigl(\;(k-1)\bm(\mu)\delta_{\bc(\;(k-1)\mu)}+\bm(\mu)\delta_{\bc(\mu)}\;\bigr) =\bc\bigl(\; k\bm(\mu)\delta_{\bc(\mu)}\;\bigr)={\bc(\mu)} 



Step 2. (Equidistribution)   For any n>0 and any collinear points \bp_1,\dotsc,\bp_n such that


|\bp_1-\bp_2|=\cdots=|\bp_{n-1}-\bp_n|

we have

\eC\Bigl(\sum_{k=1}^n\delta_{\bp_k}\;\Bigr)=\eC_0\Bigl(\sum_{k=1}^n\delta_{\bp_k}\;\Bigr) \tag{E}\label{E}.

Equivalently, this means that

\bc\Bigl(\sum_{k=1}^n\delta_{\bp_k}\;\Bigr)=\bc_0\bigl(\sum_{k=1}^n\delta_{\bp_k}\;\bigr)={\frac{1}{n}(\bp_1+\cdots+\bp_n)}.  


We  will prove  (\ref{E}) arguing by induction on n. For n=1,2 this follows from the normalization property. Assume that (\ref{E}) is true for any n< m. We want to prove it is true for n=m.

We distinguish two cases.


(a)  m is even, m= 2m_0. We set

\mu_1=\sum_{j=1}^{m_0} \delta_{\bp_j},\;\;\mu_2=\sum_{j=m_0+1}^{2m_0}\delta_{\bp_j}.

Then

\bc(\mu_1+\mu_2)= \bc\bigl(\; m_0\delta_{\bc(\mu1)}+m_0\delta{\bc(\mu_2)}\;\bigr) =\bc\bigl( \delta_{\bc(\mu_1)}+\delta_{\bc(\mu_2)}\;\bigr). \tag{1}\label{2}

By induction

 \bc(\mu_1)=\bc_0(\mu_1),\;\;\bc(\mu_2)=\bc_0(\mu_2).  

The normalization  condition now implies that

 \bc\bigl( \delta_{\bc(\mu_1)}+\delta_{\bc(\mu_2)}\;\bigr)=\bc_0\bigl( \delta_{\bc_0(\mu_1)}+\delta_{\bc_0(\mu_2)}\;\bigr).

Now run the  arguments in (\ref{2}) in reverse, with \bc replaced by \bc_0.

(b) m is odd, m=2m_0+1.  Define


\mu_1=\delta_{\bp_{m_0+1}},\;\;\mu_2'=\sum_{j<m_0+1}\delta_{\bp_j},\;\;\mu_2''=\sum_{j>m_0+1}\delta_{\bp_j},\;\;\mu_2=\mu_2'+\mu_2''.

(Observe that \bp_{m_0+1} is the mid-point in the string of equidistant collinear points \bp_1,\dotsc,\bp_{2m_0+1}. ) We have

\eC(\mu_2'+\mu_2'')=\eC\bigl( \; \eC(\mu_2')+\eC(\mu_2'')\;\bigr).  

By induction

  \eC(\mu_2')+\eC(\mu_2'')=  \eC_0(\mu_2')+\eC_0(\mu_2'') =m_0\delta_{\bc_0(\mu_2')}+m_0\delta_{\bc_0(\mu_2'')}=m_0\bigl(\;\delta_{\bc_0(\mu_2')}+\delta_{\bc_0(\mu_2'')}\;\bigr).  

Observing that

\frac{1}{2}\bigl(\bc_0(\mu_2')+\bc_0(\mu_2'')\;\bigr)=\bp_{m_0+1}

we deduce

\eC(\mu_2)= \eC(\mu_2'+\mu_2'')=m_0\eC\bigl( \delta_{\bc_0(\mu_2')}+\delta_{\bc_0(\mu_2'')}\;\bigr)=m_0\eC_0\bigl( \delta_{\bc_0(\mu_2')}+\delta_{\bc_0(\mu_2'')}\;\bigr)=2m_0\delta_{\bp_{m_0+1}}=2m_0\mu_1.  

Finally  we deduce

\eC(\mu)=\eC\bigl(\;\eC(\mu_1)+\eC(\mu_2)\;\bigr)=\eC\bigl(\;\eC(\mu_1)+2m_0\eC(\mu_1)\;\bigr)= (2m_0+1)\delta_{\bp_{m_0+1}}=\eC_0(\mu).

 Step 3. (Replacement) We will show that for any distinct points \bq_1,\bq_2 and any positive integers m_1,m_2 we  can find  (m_1+m_2) equidistant points  \bp_1,\dotsc,\bp_{m_1+m_2} on the line determined by \bq_1 and \bq_2  such that

m_1\delta_{\bq_1}=\eC_0\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr)=\eC\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr),\;\;\;m_2\delta_{\bq_2}=\eC_0\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr)=\eC\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr).

This is elementary. Without  restricting the generality we can assume  that \bq_1 and \bq_2 lie on an axis (or geodesic) \bR of \bR^N, \bq_0=0 and \bq_2=q>0.       Clearly we can find  real numbers x_0, r, r>0, such that
\frac{1}{m_1}\sum_{j=1}^{m_1}(x_0+j r)=0,\;\;\frac{1}{m_2}\sum_{j=m_1+1}^{m_1+m_2}(x_0+jr)=q.  

Indeed, the above two equalities can be rewritten as

x_0+\frac{m_1+1}{2} r=0,

q=x_0 +m_1 r+\frac{m_2+1}{2}=x_0+\frac{m_1+1}{2} r+\frac{m_1+m_2}{2} r.  

Now place the points  \bp_j at the locations x_0+jr.

Step 4. (Conclusion)   We argue on  by induction on mass that

\eC(\mu)=\eC_0(\mu),\;\;\forall \mu\in \Div_+\tag{2}\label{3}

Clearly, the normalization condition  shows that (\ref{3})  is true if \supp\mu consists of a single point, or if  \bm(\mu)\leq 2.

In general if \bm(\mu)>2 we write \mu=\mu_1+\mu_2 where m_1=\bm(\mu_1),m_2=\bm(\mu_2)<\bm(\mu).

By induction we have

\eC(\mu)= \eC\bigl( \eC(\mu_1)+\eC(\mu_2)\bigr)=\eC(\;\eC_0(\mu_1)+\eC_0(\mu_2)\;\bigr).

If \bc_0(\mu_1)=\bc_0(\mu_2)  the divisors \eC_0(\mu_1) \eC_0(\mu_2) are supported at the same point and we are done. Suppose that  \bq_1=\bc_0(\mu_1)\neq\bc_0(\mu_2)=\bq_2. By Step 3, we can  find     equidistant points \bp_1,\dotsc,\bp_{m_1+m_2} such that

m_1\delta_{\bq_1}=\eC\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr)= \eC_0\Bigl(\sum_{j=1}^{m_1} \delta_{\bp_j}\Bigr)
m_2\delta_{\bq_2}=\eC\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr)=\eC_0\Bigl(\sum_{j=m_1+1}^{m_1+m_2} \delta_{\bp_j}\Bigr).

We deduce that

\eC(\mu)=\eC\Bigl(\sum_{k=1}^{m+1+m_2}\delta_{\bp_k}\Bigr),\;\; \eC_0(\mu)=\eC_0\Bigl(\sum_{k=1}^{m+1+m_2}\delta_{\bp_k}\Bigr).  

The conclusion now follows from  (\ref{E}).  q.e.d



Remark.    The above proof    does not really use the  linear structure. If we uses only the fact that any two points in \bR^N determine a unique geodesic.  The  Normalization condition can be replaced by the equivalent one

\bc(\delta_\bp+\delta_\bq)= \mbox{the midpoint of the  geodesic segment  $[\bp,\bq]$}.

If we replace \bR^N with a hyperbolic space the same arguments show that  there exists at most  one center of mass map.











Monday, December 17, 2012

A really nifty linear algebra trick

I've been stuck on this statistics problem for quite a while, I could taste where it was going but I could never put into words my   intuition. Today I discovered how to neatly  bypass the obstacle.  Discovered is not the appropriate word, because someone else  figured it out  long before me.  It's a really, really elementary linear algebra trick  that I have never encountered in my travels. Very likely, more experienced statisticians than myself    would  smile at my ignorance.

The earliest occurrence of this trick I could trace is    in a Russian paper  by  R. N. Belyaev published in  Teoryia Veroyasnosti i eio  Primenenyia, 1966.    \newcommand{\bR}{\mathbb{R}} \newcommand{\bx}{\boldsymbol{x}} \newcommand{\by}{\boldsymbol{y}}

Suppose that  S is an invertible n\times n matrix and \bx,\by\in\bR^n are vectors which I regard as column vectors, i.e., column matrices.  Denote by (-, -) the  natural inner product in  \bR^n

(\bx,\by)= \bx^\dagger\cdot \by,

where {}^\dagger denotes the transpose of a matrix.

Let r\in\bR. The name of the game is to compute the scalar

r- (\bx, S^{-1} \by)= r-\bx^\dagger\cdot S\cdot \by.

Such computations are often required in the neck of the woods where I've been spend the best  part of the last three years namely,  geometric probability. So here is the trick. I'll name it after Belyaev because I am sure he was not the first to observe it. (He even refers to an old book by H. Cramer on statistics.) \newcommand{\one}{\boldsymbol{1}}


Belyaev's Trick.   


 r-\bx^\dagger\cdot S\cdot \by =\frac{\det\left[\begin{array}{cc} S &\by\\ \bx^\dagger & r \end{array}\right]}{\det S}=\frac{\det\left[\begin{array}{cc} r &\bx^\dagger\\ \by & S \end{array}\right]}{\det S}.


Here is the disappointingly simple proof. Note that


 \left[  \begin{array}{cc}\one_n & S^{-1}\by\\ 0 & r-\bx^\dagger S^{-1} \by \end{array} \right]= \left[  \begin{array}{cc} \one_n & 0\\-\bx^\dagger & 1 \end{array} \right]\cdot  \left[  \begin{array}{cc} S^{-1} & 0\\ 0 & 1  \end{array}\right] \cdot  \left[ \begin{array}{cc} S &\by\\ \bx^\dagger & r \end{array}\right].  

Now take the determinants of  both sides to obtain the first equality.  The second equality  follows  from the first by permuting the rows and columns of the matrix at numerator. \DeclareMathOperator{\Cov}{\boldsymbol{Cov}} \newcommand{\bsE}{\boldsymbol{E}}


Here is how it works in practice.   Suppose that   (X_0, X_1, \dotsc, X_n)\in \bR^{n+1} is a centered random  Gaussian, with covariance matrix

\Cov(X_0, X_1, \dotsc, X_n)= \Bigl( \;\bsE\bigl( X_i\cdot X_j\,\bigr)\;\Bigr)_{0\leq i,j\leq n}.

Assume  that the Gaussian vector (X_1,\dotsc, X_n) is nondegenerate, i.e., the  symmetric matrix  S=\Cov(X_1,\dotsc, X_n) is invertible.

We can then define in an unambiguous way the conditional random variable \DeclareMathOperator{\var}{\boldsymbol{var}}

(X_0|\; X_1=\cdots =X_n=0).

This is a  centered Gaussian random variable with variance given by the   the regression formula

\var(X_0|\; X_1=\cdots =X_n=0)= \var(X_0) - \bx^\dagger\cdot S\cdot \bx,

 where \bx^\dagger is the row vector

\bx^\dagger =\left(\; \bsE(X_0X_1),\cdots ,\bsE(X_0 X_n)\;\right).

If we now use  Belyaev's trick we deduce

  \var(X_0|\; X_1=\cdots =X_n=0)=\frac{\det\Cov(X_0, X_1, \dotsc, X_n)}{\det\Cov(X_1,\dotsc, X_n)}.  

In this form it is used in the related paper of Jack Cuzick (Annals of Probability, 3(1975), 849-858.)


On a "Car-Talk" problem

While driving back home I heard an interesting math question from all places, the Car Talk show on NPR.  This   made me think of a generalization of the trick they used and in particular, formulate the following problem.  \newcommand{\bR}{\mathbb{R}}

Problem  Determine all, reasonably well  behaved compact domains  D\subset \bR^2  with the following  property: any line through the origin divides  D into two regions of equal areas.  We will refer to this as property \boldsymbol{C} (for cut).


I know that "reasonably well behaved" is a rather fuzzy   requirement.  At this moment I don't want to think of Cantor like weirdos.  So let's assume that D is semialgebraic.


We say that a domain D satisfies property \boldsymbol{S} (for symmetry) if it is invariant with respect to the  involution

\bR^2\ni (x,y)\mapsto (-x,-y) \in \bR^2.


It is not hard to see that

\boldsymbol{S}\Rightarrow \boldsymbol{C}.

Is the converse true?

 I describe below one situation when this happens.


A special case.    I'll assume that D is semialgebraic,  star-shaped with respect to the origin and satisfies \boldsymbol{C}.  We can  then describe D in polar coordinates by an inequality of the form

(r,\theta)\in D \Longleftrightarrow  0\leq f(\theta),\;\;\theta\in[0,2\pi],

where f:[0,2\pi]\to (0,\infty) is a semialgebraic function  such that R(0)=R(2\pi). We can extend f by 2\pi-periodicity to a function  f:\bR\to [0,\infty) whose restriction to any finite interval is semialgebraic.

For any \phi\in[0,2\pi] denote by \ell_\phi:\bR^2\to \bR the linear functiondefined by

\ell_\phi (x,y)= x\cos\phi +y\sin\phi.

Denote by  A(\phi) the area of the region D\cap \bigl\{ \ell\phi\geq 0\bigr\}.    Since D satisfies \boldsymbol{C} we deduce

A(\phi)=\frac{1}{2} {\rm area}\;(D),

so that

A'(\phi)=0,\;\;\forall \phi.

Observe that

A(\phi+\Delta \phi)-A(\phi) =\int_{\phi+\frac{\pi}{2}}^{\phi+\frac{\pi}{2}+\Delta\phi} \left(\int_0^{f(t)} r dr\right) dt -\int_{\phi+\frac{3\pi}{2}}^{\phi+\frac{3\pi}{2}+\Delta\phi} \left(\int_0^{f(t)} r dr\right) dt.

For simplicity we set \theta=\theta(\phi)=\theta+\frac{\pi}{2}. We can then rewrite the  above equality  as


 A(\phi+\Delta \phi)-A(\phi)=\int_\theta^{\theta+\Delta\theta}\left(\int_0^{f(t)} r dr\right) dt -\int_{\theta+\pi}^{\theta+\pi+\Delta\theta} \left(\int_0^{f(t)} r dr\right) dt. 

Hence

0=A'(\phi)= \frac{}{2}\Bigl( f(\theta)^2-f(\theta+\pi)^2\Bigr).

Hence f(\theta)= f(\theta+\pi), \forall \theta. This shows that   D satisfies the symmetry condition \boldsymbol{S}.

\ast\ast\ast

Here is a simple instance when \boldsymbol{C}   does not imply \boldsymbol{S}.

Suppose that D is semialgebraic and has the annular description

f(\theta)\leq r\leq  F(\theta). \tag{1}\label{1}.


Using the same notations as above we deduce that

0=A'(\phi)= \frac{1}{2} \Bigl(F^2(\theta)-f^2(\theta)\Bigr)-  \frac{1}{2} \Bigl(F^2(\theta+\pi)-f^2(\theta+\pi)\Bigr).


Thus, the domain (\ref{1}) satisfies \boldsymbol{C} iff the function G(\theta)=F^2(\theta)-f^2(\theta) is \pi-periodic.  Note that


F(\theta)= \sqrt{f^2(\theta)+ G(\theta)}.

If we choose

f(\theta)=  e^{\sin \theta},\;\; G(\theta)=e^{\cos 2\theta},

 then we obtain the domain bounded by the two closed curves in the   figure below. This domain obviously violates the  symmetry condition \boldsymbol{S}.










Wednesday, December 12, 2012

12.12.12-Once in a century

I had to do this. It the last time this century one can do this, and I could not pass this opportunity to immortalize it.

Geometry conference in the memory of Jianguo Cao

It's been a bit over a year and a half now since my dear friend Jianguo Cao unexpectedly passed away. I miss him for many reasons. He   was  my gentle, wise and always wellcoming    Riemann geometry guru.   Our department is organizing a conference in his memory  (March 13-17, 2013).  The least I could do  is to spread the word. Unfortunately, I cannot attend. In any case here is a picture of Jianguo from 2004.  He is the leftmost person in the row, I am the  only bearded  guy.

On conditional expectations.

\newcommand{\bR}{\mathbb{R}}  I  am still struggling with the idea of conditioning.   Maybe this public  confession will help clear things out.   \newcommand{\bsP}{\boldsymbol{P}} \newcommand{\eA}{\mathscr{A}} \newcommand{\si}{\sigma}

Suppose that (\Omega, \eA, \bsP) is a probability space, where \eA is a \si-algebra of subsets of \Omega and  \bsP:\eA\to [0,1] is a probability measure. We assume that \eA is complete with respect to \bsP, i.e.,  subsets of \bsP-negligible subsets are measurable. \newcommand{\bsU}{{\boldsymbol{U}}} \newcommand{\bsV}{{\boldsymbol{V}}} \newcommand{\eB}{\mathscr{B}}

Assume that \bsU and \bsV are two finite dimensional real vector spaces equipped with the \si-algebras of Borel subsets, \eB_{\bsU} and respectively \eB_{\bsV}. Consider two random  variables  X:\Omega\to \bsU and Y:\Omega\to \bsV with probability   measures

p_X=X_*\bsP,\;\;p_Y=Y_*\bsP.

Denote by p_{X,Y} the joint probability measure \newcommand{\bsE}{\boldsymbol{E}}

p_{X,Y}=(X\oplus Y)_*\bsP.

The expectation \bsE(X|Y) is a new \bsV-valued  random variable \omega\mapsto \bsE(X|Y)_\omega,  but on a different probability space (\Omega, \eA_Y, \bsP_Y) where \eA_Y=Y^{-1}(\eB_\bsV), and \bsP_Y is the restriction of \bsP to \eA_Y. The events in \eA_Y all have the form \{Y\in B\}, B\in\eB_{\bsU}.

This  \eA_Y-measurable random variable is defined uniquely by the  equality

\int_{Y\in B} E(X|Y)_\omega \bsP_Y(d\omega) = \int_{Y\in B}X(\omega) \bsP(d\omega),\;\;\forall B\in\eB_\bsV.

Warning: The truly subtle thing in the above   equality is  the integral in the left-hand-side which is performed with respect to the restricted measure \bsP_Y.

If we denote by I_B the indicator function of  B\in\eB_\bsV, then we can rewrite the above  equality as

\int_\Omega \bsE(X|Y)_\omega I_B(Y(\omega)) \bsP_Y(d\omega)=\int_\Omega X(\omega) I_B(Y(\omega) )\bsP(d\omega).

In particular  we deduce that for any  step function f: \bsV \to \bR we have

\int_\Omega \bsE(X|Y)_\omega f(Y(\omega)) \bsP_Y(d\omega) =\int_\Omega X(\omega) f(Y(\omega) )\bsP(d\omega). 

The random variable  \bsE(X|Y)  defines  a   \bsU-valued  random variable \bsV\ni y\mapsto \bsE(X|y)\in\bsU on the probability space (\bsV,\eB_\bsV, p_Y)   where

\int_B  \bsE(X| y) p_Y(dy)=\int_{(x,y)\in B\times\bsV} x p_{X,Y}(dxdy).

Example 1.   Suppose that A, B\subset  \Omega,  X=I_A, Y=I_B.   Then  \eA_Y is the \si-algebra generated by B. The random variable \bsE(I_A|I_B)  has a constant value x_B on B and a constant value x_{\neg B} on \neg B :=\Omega\setminus B. They are determined by the equality

x_B \bsP(B)= \int_B I_A(\omega)\bsP(d\omega) =\bsP(A\cap B)

so that

x_B=\frac{\bsP(A\cap B)}{\bsP(B)}=\bsP(A|B).

Similarly

 x_{\neg B}= \bsP(A|\neg B).


\ast\ast\ast



Example 2.   Suppose \bsU=\bsV=\bR and that X and  Y are discrete random variables with ranges R_X and R_Y.   The random variable  \bsE(X|Y) has a constant value \bsE(X|y) on the set \{Y=y\}, y\in R_Y. It is determined from the equality

\bsE(X|Y=y)p_Y(y)=\int_{Y=y} \bsE(X|Y)_\omega \bsP_Y(\omega) =\int_{Y=y} X(\omega) d\bsP(\omega).

Then \bsE(X|Y) can be viewed as a random variable (R_Y, p_Y)\to \bR,  y\mapsto \bsE(X|Y)_y=\bsE(X|Y=y), where

\bsE(X|Y=y) =\frac{1}{p_Y(y)}\int_{Y=y} X(\omega) d\bsP(\omega).

For this reason  one should think of \bsE(X|Y) as a function of Y.    From this point of view, a more appropriate notation would be \bsE_X(Y).


The joint probability  distribution  p_{X,Y} can be viewed as a function

p_{X,Y}: R_X\times R_Y\to \bR_{\geq 0},\;\;\sum_{(x,y)\in R_X\times R_Y} p_{X,Y}(x,y)= 1.

Then

\bsE(X|Y=y)= \sum_{x\in R_X} x\frac{P_{X,Y}(x,y)}{p_Y(y)}.  

We introduce   new R_X-valued random variable  (X|Y=y) with probability distribution p_{X|Y=y}(x)=\frac{p_{X,Y}(x,y)}{p_Y(y)}.

Then \bsE(X|Y=y) is  the expectation of the  random variable (X|Y=y).

\ast\ast\ast

Example 3. Suppose that  \bsU, \bsV are equipped with  Euclidean metrics,  X,Y are centered Gaussian random vectors with  covariance forms A and respectively B.   Assume  that the covariance pairing between X and Y is C so that the covariance  form of (X, Y) is

S=\left[ \begin{array}{cc} A & C\\ C^\dagger & B \end{array} \right].

We have \newcommand{\bsW}{\boldsymbol{W}}

P_{X,Y}(dw)  =\underbrace{\frac{1}{\sqrt{\det 2\pi S}}  e^{-\frac{1}{2}(S^{-1}w,w)} }_{=:\gamma_S(w)}dw,\;\;w=x+y\in \bsW:=\bsU\oplus \bsV

P_Y(dy) = \gamma_B(y) dy),\;\;\gamma_B(Y)=\frac{1}{\sqrt{\det 2\pi B}}  e^{-\frac{1}{2}(B^{-1}w,w)} .


For any  bounded measurable function f:\bsU\to \bR we have


\int_{\Omega} f(X(\omega)) \bsP(d\omega)=\int_\bsV \bsE(f(X)|Y) \bsP_Y(d\omega)=\int_\bsV \bsE(f(X)| Y=y)  dp_Y(y).

We deduce

\int_{\bsU\oplus \bsV} f(x) \gamma_S(x,y) dx dy= \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy.

Now observe that

 \int_{\bsV}\left(\int_\bsU f(x) \gamma_S(x,y) dx\right) dy =  \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy.


This implies that

 \bsE(f(X)| Y=y) =\frac{1}{\gamma_B(y)} \left(\int_\bsU f(x) \gamma_S(x,y) dx\right).

We obtain a probability measure  p_{X|Y=y} on the affine plane \bsU\times \{y\} given by

p_{X|Y=y}(dx)= \frac{\gamma_S(x,y)}{\gamma_B(y)} dx.


This is a Gaussian    measure on \bsU. Its   statistics are described by  the regression formula.  More precisely, its mean is

m_{X|Y=y}= Cy,

and its covariance form is

S_{X|Y=y}=  A- CB^{-1}C^\dagger.

\ast\ast\ast

In general, if we think of p_{X,Y} as a density on \bsU\oplus \bsV,  of p_Y as a density on \bsV and we denote by \pi_\bsV the natural projection \bsU\oplus\bsV\to\bsV, then the conditional probability distribution  p_{X|Y=y} is a a probability density on \pi^{-1}_\bsV(y). More precisely  it is the density p_{X,Y}/\pi^*_Vp_Y defined as in  Section 9.1.1 of my lectures,  especially Proposition 9.1.8 page 350 of the lectures.

Monday, December 3, 2012

Degeneration of Gaussian measures

\newcommand{\bR}{\mathbb{R}} \newcommand{\ve}{{\varepsilon}} \newcommand{\bsV}{\boldsymbol{V}}  Suppose that \bsV is an N-dimensional real Euclidean space   equipped with an orthogonal  direct sum \newcommand{\bsU}{\boldsymbol{U}} \newcommand{\bsW}{\boldsymbol{W}}

\bsV =\bsU\oplus \bsW. \tag{1}\label{1}

Suppose that S_n: \bsU\to\bsU and C_n:\bsW\to \bsW are symmetric positive definite  operators such that

S_n\to 0,\;\;C_n\to C,\;\;\mbox{as}\;\;n\to \infty

where C is a symmetric positive definite operator on \bsW.     We set


A_n=S_n\oplus C_n :\bsV\to \bsV

and we think of A_n as the covariance   matrix  of a  Gaussian measure on \bsV \newcommand{\bv}{\boldsymbol{v}}

\gamma_{A_n}(|d\bv|)=\frac{1}{\sqrt{\det 2\pi A_n}}  e^{-\frac{1}{2}(A_n^{-1} \bv,\bv)} |d\bv|.

Suppose that f:\bsV\to \bR is a  locally Lipschitz function,   positively homogeneous of degree k\geq 1.

 I am interested in the  behavior as n\to \infty of the expectation

E_n(f):=\int_{\bsV} f(\bv)\gamma_{A_n}(|d\bv|).

\newcommand{\bu}{\boldsymbol{u}}  \newcommand{\bw}{\boldsymbol{w}} We respect to the decomposition (\ref{1}) a vector  \bv\in \bv_0  can be written as an orthogonal sum \bv=\bu+\bw.

Define

\bar{f}_n:\bsW\to [0,\infty),\;\; \bar{f}_n(\bw)= \int_{\bsU} f(\bu+\bw) \gamma_{S_n}(|d\bu|),  

where d\gamma_{S_n} denotes the Gaussian measure  on \bsU with  covariance form S_n.  Then


E_n(f)=\int_{\bsW} \bar{f}_n(\bw) \gamma_{C_n}(|d\bw|). \tag{2}\label{2}  

For \bw\in \bsW and r\in (0,1] we set

m(\bw, r) := \sup_{|\bu|\leq r}|f(\bw+u)- f(u)|.  

Note that

\exists L>0: m(\bw,r)\leq   Lr,\;\;\forall |\bw|= 1\tag{3}\label{3}

In general,   we set \bar{\bw}:=\frac{1}{|\bw|} \bw. If |\bu|\leq r  and we have
\bigl|\;f(\bw+\bu)-g(\bw) \;\bigr|= |\bw|^k \left| f\Bigl(\bar{\bw}+\frac{1}{|\bw|} \bu\Bigr) -f(\bar{\bw})\right| \leq  L |\bw|^{k-1} r,
so that
m(w,r) \leq L|\bw|^{k-1} r,\;\;\forall \bw\in\bsW,\;\;r\in (0,1].  \tag{4}\label{4}

To proceed further, we need a vector counterpart for the Chebysev inequality.

Lemma 1.  Suppose S:\bsU\to \bsU is a  symmetric, positive definite operator. We set R:=S^{-\frac{1}{2}} and  denote by \gamma_{S} the associated   Gaussian measure. Then for any c,\ell>0  we have \newcommand{\bsi}{\boldsymbol{\sigma}}

\int_{ |R \bu|\geq c}  |\bu|^\ell d\gamma_S(|\bu|) \leq \sqrt{2^{\ell+m-\frac{3}{2}} \Gamma\Bigl(\; \ell+m-\frac{1}{2}\;\Bigr)} \frac{\bsi_{m-1}}{(2\pi)^{\frac{m}{2}}} \Vert S\Vert^{\frac{\ell}{2}}c^{-\frac{1}{2}}e^{-\frac{c^2}{4}} , \tag{5}\label{5}

where m=\dim\bsU and  and \bsi_N denote the area of the N-dimensional unit sphere.

Proof.   We make the change in variables \newcommand{\bx}{\boldsymbol{x}}  \bx:=R\bu and we  deduce
\int_{ |R \bu|\geq c}  |\bu|^\ell d\gamma_S(|\bu|)\leq \frac{1}{(2\pi)^{\frac{m}{2}}} \int_{|\bx|\geq c} |S^{\frac{1}{2}} \bx|^\ell  e^{-\frac{1}{2}|\bx|^2} |d\bx|

\leq \frac{\Vert|S\Vert^{\frac{\ell}{2}}}{(2\pi)^{\frac{m}{2}}} \int_{|\bx|\geq c} |\bx|^\ell  e^{-\frac{1}{2}|\bx|^2} |d\bx|=\frac{\bsi_{m-1}\Vert S\Vert^{\frac{\ell}{2}}}{(2\pi)^{\frac{m}{2}}}\int_{t>c} t^{\ell+m-1} e^{-\frac{1}{2} t^2} dt
\leq  \frac{\bsi_{m-1}\Vert S\Vert^{\frac{\ell}{2}}}{(2\pi)^{\frac{m}{2}}} \left(\int_{t>c}  e^{-\frac{1}{2} t^2} dt\right)^{\frac{1}{2}}\left(\int_{t>0} t^{2\ell+2m-2} e^{-\frac{1}{2} t^2} dt\right)^{\frac{1}{2}}
Now observe that we have
\int_{t>c}  e^{-\frac{1}{2} t^2} dt \leq \frac{1}{c} e^{-\frac{c^2}{2}},
and  using the change of variables s=\frac{t^2}{2} we  deduce

 \int_{t>0} t^{2\ell+2m-2} e^{-\frac{1}{2} t^2} dt =2^{\ell+m-\frac{3}{2}}\int_0^\infty s^{\ell+m-\frac{1}{2}-1} e^{-s} ds= 2^{\ell+m-\frac{3}{2}} \Gamma( \ell+m-\frac{1}{2}).

This proves the lemma. q.e.d




We now want to compare \bar{f}_n(\bw) and f(\bw) for \bw\in\bsW.  We plan to use Lemma  1.   Set R_n:=S_n^{-\frac{1}{2}} and m:=\dim\bsU.   Observe that

|\bu\|= |S_n^{\frac{1}{2}}R_n\bu|\leq \Vert S_n^{\frac{1}{2}}\Vert\cdot |R_n\bu|.

For simplicity   set s_n:=  \Vert S_n^{\frac{1}{2}}\Vert.   Choose a sequence of positive numbers  c_n such that c_n\to\infty and   s_n c_n\to 0.  Later  we will add several requirements to this sequence.

\bigl|\;\bar{f}_n(\bw)-f(\bw)\;\bigr|=\left| \int_{\bsU} (\; f(\bw+\bu)- f(\bw)\; ) \gamma_{S_n}(|d\bu|)\right|
\leq \left| \int_{|R_n\bu|\leq c_n}  (\; f(\bw+\bu)- f(\bw)\; ) \gamma_{S_n}(|d\bu|)\right|+\left|\int_{|R_n\bu|\geq c_n} (\; f(\bw+\bu)- f(\bw)\; ) \gamma_{S_n}(|d\bu|)\right|
\stackrel{(\ref{4})}{\leq} L|\bw|^{k-1}s_n c_n +C \int_{|R_n\bu|\geq c_n}(|\bw|^k+|\bu|^k) \gamma_{S_n}(|d\bu|)

\stackrel{(\ref{5})}{\leq}  L|\bw|^{k-1}s_n c_n + Z(k, m)c_n^{-\frac{1}{2}} e^{-\frac{c_n^2}{4}}(1+s_n^k),
where Z(k,m) is a constant that depends only on   k and m.


We deduce that there exists a constant C>0 independent of   n,w  such that  for any sequence c_n\to \infty such that s_nc_n\to 0, s_n:=\Vert S_n\Vert^{\frac{1}{2}} we have

\bigl|\;\bar{f}_n(\bw)-f(\bw)\;\bigr| \leq C\bigl(\; |\bw|^{k-1}s_nc_n + e^{-\frac{c_n^2}{4}}\;\bigr). \tag{6}\label{6}
We deduce that

\Bigl|\; E_n(f) -\int_{\bsW} f(\bw) \gamma_{C_n}(|d\bw|)\;\Bigr|  \leq   C\left(s_nc_n\int_{\bsW} |\bw|^{k-1} \gamma_{C_n}(|d\bw|) + e^{-\frac{c_n^2}{4}}\;\right).\tag{7}\label{7}

Finally let us estimate

D_n:=\int_{\bsW} f(\bw) \gamma_{C_n}(|d\bw|)-\int_{\bsW} f(\bw) d\gamma_{C}(|d\bw|).

We have  \newcommand{\one}{\boldsymbol{1}}
D_n= \int_{\bsW} \left( f\bigl( C_n^{\frac{1}{2}}\bw\;\bigr)-f\bigl( C^{\frac{1}{2}}\bw\;\bigr) \;\right)\gamma_{\one}(|\bw|)
and we conclude that
\left|\; \int_{\bsW} f(\bw) d\gamma_{C_n}(|d\bw|)-\int_{\bsW} f(\bw) d\gamma_{C}(|d\bw|)\;\right| \leq  L \Bigl\Vert \;C_n^{\frac{1}{2}}-C^\frac{1}{2}\;\Bigr\Vert \int_{\bsW}|\bw|^k \gamma_{\one}(|d\bw|). \tag{8}\label{8}

In (\ref{7}) we let c_n:=s_n^{-\ve}. If we denote by A_\infty the limit of the covariance matrices A_n, A=\lim_{n\to\infty} A_n =0\oplus C, then we deduce from the above computations that for any \ve>0 there exists a  constant C_\ve>0 such that
\left|\; \int_{\bsV} f(\bv) \gamma_{A_n} (|d\bw|) -\int_{\bsV} f(\bv) \gamma_{A_\infty} (|d\bw|) \;\right|\leq C_\ve \left(s_n^{1-\ve}+ \Bigl\Vert \;C_n^{\frac{1}{2}}-C^\frac{1}{2}\;\Bigr\Vert\right)\leq C_\ve \Bigl\Vert A_n^{\frac{1}{2}}-A_\infty^{\frac{1}{2}}\Bigr\Vert^{1-\ve}.\tag{9}\label{9}
This can be generalized a bit. Suppose that T_n:\bsU\to \bsU is a sequence of orthogonal operators such that  T_n\to \one_{\bsU}

Using (\ref{7})  we deduce

\left|\;\int_{\bsV} T^*_nf(\bv) \gamma_{A_n}(|d\bv|)-\int_{\bsV}  f(\bv) \gamma_{A_n}(|d\bv|)\right|= \left| \int_{\bsV} f(T_n A_n^{\frac{1}{2}}\bx)-  f( A_n^{\frac{1}{2}}\bx)\gamma_{\one}(|d\bx|) \right| \leq L \Bigl\Vert A_n^{\frac{1}{2}}\Bigr\Vert \Vert T_n-\one\Vert.
Observe that
\int_{\bsV} T^*_nf(\bv) \gamma_{A_n}(|d\bv|)=\int_{\bsV} f(\bv) \gamma_{B_n}(|d\bv|),
where
B_n= T_nA_nT_n^*.

Suppose that we are in the fortunate case when f|_{\bsW}=0.    Then

\int_{\bsW} f(\bw) d\gamma_{C_n}(|d\bw|)=\int_{\bsW} f(\bw) d\gamma_{C}(|d\bw|)=0

and (\ref{9})  can be improved to

\left|\; \int_{\bsV} f(\bv) \gamma_{A_n} (|d\bw|)\right|\leq C_\ve s_n^{1-\ve}.




On the 11/8-conjecture

Stefan Bauer has just posted a proof for the 11/8- conjecture for simply connected 4-manifolds