$\newcommand{\bR}{\mathbb{R}}$ I am still struggling with the idea of conditioning. Maybe this public confession will help clear things out. $\newcommand{\bsP}{\boldsymbol{P}}$ $\newcommand{\eA}{\mathscr{A}}$ $\newcommand{\si}{\sigma}$
Suppose that $(\Omega, \eA, \bsP)$ is a probability space, where $\eA$ is a $\si$-algebra of subsets of $\Omega$ and $\bsP:\eA\to [0,1]$ is a probability measure. We assume that $\eA$ is complete with respect to $\bsP$, i.e., subsets of $\bsP$-negligible subsets are measurable. $\newcommand{\bsU}{{\boldsymbol{U}}}$ $\newcommand{\bsV}{{\boldsymbol{V}}}$ $\newcommand{\eB}{\mathscr{B}}$
Assume that $\bsU$ and $\bsV$ are two finite dimensional real vector spaces equipped with the $\si$-algebras of Borel subsets, $\eB_{\bsU}$ and respectively $\eB_{\bsV}$. Consider two random variables $X:\Omega\to \bsU$ and $Y:\Omega\to \bsV$ with probability measures
$$p_X=X_*\bsP,\;\;p_Y=Y_*\bsP. $$
Denote by $p_{X,Y}$ the joint probability measure $\newcommand{\bsE}{\boldsymbol{E}}$
$$ p_{X,Y}=(X\oplus Y)_*\bsP. $$
The expectation $\bsE(X|Y)$ is a new $\bsV$-valued random variable $\omega\mapsto \bsE(X|Y)_\omega$, but on a different probability space $(\Omega, \eA_Y, \bsP_Y)$ where $\eA_Y=Y^{-1}(\eB_\bsV)$, and $\bsP_Y$ is the restriction of $\bsP$ to $\eA_Y$. The events in $\eA_Y$ all have the form $\{Y\in B\}$, $B\in\eB_{\bsU}$.
This $\eA_Y$-measurable random variable is defined uniquely by the equality
$$ \int_{Y\in B} E(X|Y)_\omega \bsP_Y(d\omega) = \int_{Y\in B}X(\omega) \bsP(d\omega),\;\;\forall B\in\eB_\bsV. $$
Warning: The truly subtle thing in the above equality is the integral in the left-hand-side which is performed with respect to the restricted measure $\bsP_Y$.
If we denote by $I_B$ the indicator function of $B\in\eB_\bsV$, then we can rewrite the above equality as
$$\int_\Omega \bsE(X|Y)_\omega I_B(Y(\omega)) \bsP_Y(d\omega)=\int_\Omega X(\omega) I_B(Y(\omega) )\bsP(d\omega). $$
In particular we deduce that for any step function $f: \bsV \to \bR$ we have
$$\int_\Omega \bsE(X|Y)_\omega f(Y(\omega)) \bsP_Y(d\omega) =\int_\Omega X(\omega) f(Y(\omega) )\bsP(d\omega). $$
The random variable $\bsE(X|Y)$ defines a $\bsU$-valued random variable $\bsV\ni y\mapsto \bsE(X|y)\in\bsU$ on the probability space $(\bsV,\eB_\bsV, p_Y)$ where
$$ \int_B \bsE(X| y) p_Y(dy)=\int_{(x,y)\in B\times\bsV} x p_{X,Y}(dxdy). $$
Example 1. Suppose that $A, B\subset \Omega$, $X=I_A$, $Y=I_B$. Then $\eA_Y$ is the $\si$-algebra generated by $B$. The random variable $\bsE(I_A|I_B)$ has a constant value $x_B$ on $B$ and a constant value $x_{\neg B}$ on $\neg B :=\Omega\setminus B$. They are determined by the equality
$$ x_B \bsP(B)= \int_B I_A(\omega)\bsP(d\omega) =\bsP(A\cap B) $$
so that
$$ x_B=\frac{\bsP(A\cap B)}{\bsP(B)}=\bsP(A|B). $$
Similarly
$$ x_{\neg B}= \bsP(A|\neg B). $$
$$ \ast\ast\ast $$
Example 2. Suppose $\bsU=\bsV=\bR$ and that $X$ and $Y$ are discrete random variables with ranges $R_X$ and $R_Y$. The random variable $\bsE(X|Y)$ has a constant value $\bsE(X|y)$ on the set $\{Y=y\}$, $y\in R_Y$. It is determined from the equality
$$ \bsE(X|Y=y)p_Y(y)=\int_{Y=y} \bsE(X|Y)_\omega \bsP_Y(\omega) =\int_{Y=y} X(\omega) d\bsP(\omega). $$
Then $\bsE(X|Y)$ can be viewed as a random variable $ (R_Y, p_Y)\to \bR$, $y\mapsto \bsE(X|Y)_y=\bsE(X|Y=y)$, where
$$ \bsE(X|Y=y) =\frac{1}{p_Y(y)}\int_{Y=y} X(\omega) d\bsP(\omega). $$
For this reason one should think of $\bsE(X|Y)$ as a function of $Y$. From this point of view, a more appropriate notation would be $\bsE_X(Y)$.
The joint probability distribution $p_{X,Y}$ can be viewed as a function
$$ p_{X,Y}: R_X\times R_Y\to \bR_{\geq 0},\;\;\sum_{(x,y)\in R_X\times R_Y} p_{X,Y}(x,y)= 1. $$
Then
$$ \bsE(X|Y=y)= \sum_{x\in R_X} x\frac{P_{X,Y}(x,y)}{p_Y(y)}. $$
We introduce new $R_X$-valued random variable $(X|Y=y)$ with probability distribution $p_{X|Y=y}(x)=\frac{p_{X,Y}(x,y)}{p_Y(y)}$.
Then $\bsE(X|Y=y)$ is the expectation of the random variable $(X|Y=y)$.
$$ \ast\ast\ast $$
Example 3. Suppose that $\bsU, \bsV$ are equipped with Euclidean metrics, $X,Y$ are centered Gaussian random vectors with covariance forms $A$ and respectively $B$. Assume that the covariance pairing between $X$ and $Y$ is $C$ so that the covariance form of $(X, Y)$ is
$$ S=\left[
\begin{array}{cc} A & C\\
C^\dagger & B
\end{array}
\right]. $$
We have $\newcommand{\bsW}{\boldsymbol{W}}$
$$P_{X,Y}(dw) =\underbrace{\frac{1}{\sqrt{\det 2\pi S}} e^{-\frac{1}{2}(S^{-1}w,w)} }_{=:\gamma_S(w)}dw,\;\;w=x+y\in \bsW:=\bsU\oplus \bsV $$
$$ P_Y(dy) = \gamma_B(y) dy),\;\;\gamma_B(Y)=\frac{1}{\sqrt{\det 2\pi B}} e^{-\frac{1}{2}(B^{-1}w,w)} . $$
For any bounded measurable function $f:\bsU\to \bR$ we have
$$\int_{\Omega} f(X(\omega)) \bsP(d\omega)=\int_\bsV \bsE(f(X)|Y) \bsP_Y(d\omega)=\int_\bsV \bsE(f(X)| Y=y) dp_Y(y). $$
We deduce
$$\int_{\bsU\oplus \bsV} f(x) \gamma_S(x,y) dx dy= \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy. $$
Now observe that
$$ \int_{\bsV}\left(\int_\bsU f(x) \gamma_S(x,y) dx\right) dy = \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy. $$
This implies that
$$ \bsE(f(X)| Y=y) =\frac{1}{\gamma_B(y)} \left(\int_\bsU f(x) \gamma_S(x,y) dx\right). $$
We obtain a probability measure $p_{X|Y=y}$ on the affine plane $\bsU\times \{y\}$ given by
$$p_{X|Y=y}(dx)= \frac{\gamma_S(x,y)}{\gamma_B(y)} dx. $$
This is a Gaussian measure on $\bsU$. Its statistics are described by the regression formula. More precisely, its mean is
$$m_{X|Y=y}= Cy, $$
and its covariance form is
$$S_{X|Y=y}= A- CB^{-1}C^\dagger. $$
$$ \ast\ast\ast $$
In general, if we think of $p_{X,Y}$ as a density on $\bsU\oplus \bsV$, of $p_Y$ as a density on $\bsV$ and we denote by $\pi_\bsV$ the natural projection $\bsU\oplus\bsV\to\bsV$, then the conditional probability distribution $p_{X|Y=y}$ is a a probability density on $\pi^{-1}_\bsV(y)$. More precisely it is the density $p_{X,Y}/\pi^*_Vp_Y$ defined as in Section 9.1.1 of my lectures, especially Proposition 9.1.8 page 350 of the lectures.
No comments:
Post a Comment