## Wednesday, December 12, 2012

### On conditional expectations.

$\newcommand{\bR}{\mathbb{R}}$  I  am still struggling with the idea of conditioning.   Maybe this public  confession will help clear things out.   $\newcommand{\bsP}{\boldsymbol{P}}$ $\newcommand{\eA}{\mathscr{A}}$ $\newcommand{\si}{\sigma}$

Suppose that $(\Omega, \eA, \bsP)$ is a probability space, where $\eA$ is a $\si$-algebra of subsets of $\Omega$ and  $\bsP:\eA\to [0,1]$ is a probability measure. We assume that $\eA$ is complete with respect to $\bsP$, i.e.,  subsets of $\bsP$-negligible subsets are measurable. $\newcommand{\bsU}{{\boldsymbol{U}}}$ $\newcommand{\bsV}{{\boldsymbol{V}}}$ $\newcommand{\eB}{\mathscr{B}}$

Assume that $\bsU$ and $\bsV$ are two finite dimensional real vector spaces equipped with the $\si$-algebras of Borel subsets, $\eB_{\bsU}$ and respectively $\eB_{\bsV}$. Consider two random  variables  $X:\Omega\to \bsU$ and $Y:\Omega\to \bsV$ with probability   measures

$$p_X=X_*\bsP,\;\;p_Y=Y_*\bsP.$$

Denote by $p_{X,Y}$ the joint probability measure $\newcommand{\bsE}{\boldsymbol{E}}$

$$p_{X,Y}=(X\oplus Y)_*\bsP.$$

The expectation $\bsE(X|Y)$ is a new $\bsV$-valued  random variable $\omega\mapsto \bsE(X|Y)_\omega$,  but on a different probability space $(\Omega, \eA_Y, \bsP_Y)$ where $\eA_Y=Y^{-1}(\eB_\bsV)$, and $\bsP_Y$ is the restriction of $\bsP$ to $\eA_Y$. The events in $\eA_Y$ all have the form $\{Y\in B\}$, $B\in\eB_{\bsU}$.

This  $\eA_Y$-measurable random variable is defined uniquely by the  equality

$$\int_{Y\in B} E(X|Y)_\omega \bsP_Y(d\omega) = \int_{Y\in B}X(\omega) \bsP(d\omega),\;\;\forall B\in\eB_\bsV.$$

Warning: The truly subtle thing in the above   equality is  the integral in the left-hand-side which is performed with respect to the restricted measure $\bsP_Y$.

If we denote by $I_B$ the indicator function of  $B\in\eB_\bsV$, then we can rewrite the above  equality as

$$\int_\Omega \bsE(X|Y)_\omega I_B(Y(\omega)) \bsP_Y(d\omega)=\int_\Omega X(\omega) I_B(Y(\omega) )\bsP(d\omega).$$

In particular  we deduce that for any  step function $f: \bsV \to \bR$ we have

$$\int_\Omega \bsE(X|Y)_\omega f(Y(\omega)) \bsP_Y(d\omega) =\int_\Omega X(\omega) f(Y(\omega) )\bsP(d\omega).$$

The random variable  $\bsE(X|Y)$  defines  a   $\bsU$-valued  random variable $\bsV\ni y\mapsto \bsE(X|y)\in\bsU$ on the probability space $(\bsV,\eB_\bsV, p_Y)$   where

$$\int_B \bsE(X| y) p_Y(dy)=\int_{(x,y)\in B\times\bsV} x p_{X,Y}(dxdy).$$

Example 1.   Suppose that $A, B\subset \Omega$,  $X=I_A$, $Y=I_B$.   Then  $\eA_Y$ is the $\si$-algebra generated by $B$. The random variable $\bsE(I_A|I_B)$  has a constant value $x_B$ on $B$ and a constant value $x_{\neg B}$ on $\neg B :=\Omega\setminus B$. They are determined by the equality

$$x_B \bsP(B)= \int_B I_A(\omega)\bsP(d\omega) =\bsP(A\cap B)$$

so that

$$x_B=\frac{\bsP(A\cap B)}{\bsP(B)}=\bsP(A|B).$$

Similarly

$$x_{\neg B}= \bsP(A|\neg B).$$

$$\ast\ast\ast$$

Example 2.   Suppose $\bsU=\bsV=\bR$ and that $X$ and  $Y$ are discrete random variables with ranges $R_X$ and $R_Y$.   The random variable  $\bsE(X|Y)$ has a constant value $\bsE(X|y)$ on the set $\{Y=y\}$, $y\in R_Y$. It is determined from the equality

$$\bsE(X|Y=y)p_Y(y)=\int_{Y=y} \bsE(X|Y)_\omega \bsP_Y(\omega) =\int_{Y=y} X(\omega) d\bsP(\omega).$$

Then $\bsE(X|Y)$ can be viewed as a random variable $(R_Y, p_Y)\to \bR$,  $y\mapsto \bsE(X|Y)_y=\bsE(X|Y=y)$, where

$$\bsE(X|Y=y) =\frac{1}{p_Y(y)}\int_{Y=y} X(\omega) d\bsP(\omega).$$

For this reason  one should think of $\bsE(X|Y)$ as a function of $Y$.    From this point of view, a more appropriate notation would be $\bsE_X(Y)$.

The joint probability  distribution  $p_{X,Y}$ can be viewed as a function

$$p_{X,Y}: R_X\times R_Y\to \bR_{\geq 0},\;\;\sum_{(x,y)\in R_X\times R_Y} p_{X,Y}(x,y)= 1.$$

Then

$$\bsE(X|Y=y)= \sum_{x\in R_X} x\frac{P_{X,Y}(x,y)}{p_Y(y)}.$$

We introduce   new $R_X$-valued random variable  $(X|Y=y)$ with probability distribution $p_{X|Y=y}(x)=\frac{p_{X,Y}(x,y)}{p_Y(y)}$.

Then $\bsE(X|Y=y)$ is  the expectation of the  random variable $(X|Y=y)$.

$$\ast\ast\ast$$

Example 3. Suppose that  $\bsU, \bsV$ are equipped with  Euclidean metrics,  $X,Y$ are centered Gaussian random vectors with  covariance forms $A$ and respectively $B$.   Assume  that the covariance pairing between $X$ and $Y$ is $C$ so that the covariance  form of $(X, Y)$ is

$$S=\left[ \begin{array}{cc} A & C\\ C^\dagger & B \end{array} \right].$$

We have $\newcommand{\bsW}{\boldsymbol{W}}$

$$P_{X,Y}(dw) =\underbrace{\frac{1}{\sqrt{\det 2\pi S}} e^{-\frac{1}{2}(S^{-1}w,w)} }_{=:\gamma_S(w)}dw,\;\;w=x+y\in \bsW:=\bsU\oplus \bsV$$

$$P_Y(dy) = \gamma_B(y) dy),\;\;\gamma_B(Y)=\frac{1}{\sqrt{\det 2\pi B}} e^{-\frac{1}{2}(B^{-1}w,w)} .$$

For any  bounded measurable function $f:\bsU\to \bR$ we have

$$\int_{\Omega} f(X(\omega)) \bsP(d\omega)=\int_\bsV \bsE(f(X)|Y) \bsP_Y(d\omega)=\int_\bsV \bsE(f(X)| Y=y) dp_Y(y).$$

We deduce

$$\int_{\bsU\oplus \bsV} f(x) \gamma_S(x,y) dx dy= \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy.$$

Now observe that

$$\int_{\bsV}\left(\int_\bsU f(x) \gamma_S(x,y) dx\right) dy = \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy.$$

This implies that

$$\bsE(f(X)| Y=y) =\frac{1}{\gamma_B(y)} \left(\int_\bsU f(x) \gamma_S(x,y) dx\right).$$

We obtain a probability measure  $p_{X|Y=y}$ on the affine plane $\bsU\times \{y\}$ given by

$$p_{X|Y=y}(dx)= \frac{\gamma_S(x,y)}{\gamma_B(y)} dx.$$

This is a Gaussian    measure on $\bsU$. Its   statistics are described by  the regression formula.  More precisely, its mean is

$$m_{X|Y=y}= Cy,$$

and its covariance form is

$$S_{X|Y=y}= A- CB^{-1}C^\dagger.$$

$$\ast\ast\ast$$

In general, if we think of $p_{X,Y}$ as a density on $\bsU\oplus \bsV$,  of $p_Y$ as a density on $\bsV$ and we denote by $\pi_\bsV$ the natural projection $\bsU\oplus\bsV\to\bsV$, then the conditional probability distribution  $p_{X|Y=y}$ is a a probability density on $\pi^{-1}_\bsV(y)$. More precisely  it is the density $p_{X,Y}/\pi^*_Vp_Y$ defined as in  Section 9.1.1 of my lectures,  especially Proposition 9.1.8 page 350 of the lectures.