Suppose that $(\Omega, \eA, \bsP)$ is a probability space, where $\eA$ is a $\si$-algebra of subsets of $\Omega$ and $\bsP:\eA\to [0,1]$ is a probability measure. We assume that $\eA$ is complete with respect to $\bsP$, i.e., subsets of $\bsP$-negligible subsets are measurable. $\newcommand{\bsU}{{\boldsymbol{U}}}$ $\newcommand{\bsV}{{\boldsymbol{V}}}$ $\newcommand{\eB}{\mathscr{B}}$

Assume that $\bsU$ and $\bsV$ are two finite dimensional real vector spaces equipped with the $\si$-algebras of Borel subsets, $\eB_{\bsU}$ and respectively $\eB_{\bsV}$. Consider two random variables $X:\Omega\to \bsU$ and $Y:\Omega\to \bsV$ with probability measures

$$p_X=X_*\bsP,\;\;p_Y=Y_*\bsP. $$

Denote by $p_{X,Y}$ the joint probability measure $\newcommand{\bsE}{\boldsymbol{E}}$

$$ p_{X,Y}=(X\oplus Y)_*\bsP. $$

The expectation $\bsE(X|Y)$ is a new $\bsV$-valued random variable $\omega\mapsto \bsE(X|Y)_\omega$, but on a different probability space $(\Omega, \eA_Y, \bsP_Y)$ where $\eA_Y=Y^{-1}(\eB_\bsV)$, and $\bsP_Y$ is the restriction of $\bsP$ to $\eA_Y$. The events in $\eA_Y$ all have the form $\{Y\in B\}$, $B\in\eB_{\bsU}$.

This $\eA_Y$-measurable random variable is defined uniquely by the equality

$$ \int_{Y\in B} E(X|Y)_\omega \bsP_Y(d\omega) = \int_{Y\in B}X(\omega) \bsP(d\omega),\;\;\forall B\in\eB_\bsV. $$

*Warning:***The truly subtle**thing in the above equality is the integral in the left-hand-side which is performed with respect to the restricted measure $\bsP_Y$.

If we denote by $I_B$ the indicator function of $B\in\eB_\bsV$, then we can rewrite the above equality as

$$\int_\Omega \bsE(X|Y)_\omega I_B(Y(\omega)) \bsP_Y(d\omega)=\int_\Omega X(\omega) I_B(Y(\omega) )\bsP(d\omega). $$

In particular we deduce that for any step function $f: \bsV \to \bR$ we have

$$\int_\Omega \bsE(X|Y)_\omega f(Y(\omega)) \bsP_Y(d\omega) =\int_\Omega X(\omega) f(Y(\omega) )\bsP(d\omega). $$

The random variable $\bsE(X|Y)$ defines a $\bsU$-valued random variable $\bsV\ni y\mapsto \bsE(X|y)\in\bsU$ on the probability space $(\bsV,\eB_\bsV, p_Y)$ where

$$ \int_B \bsE(X| y) p_Y(dy)=\int_{(x,y)\in B\times\bsV} x p_{X,Y}(dxdy). $$

**Suppose that $A, B\subset \Omega$, $X=I_A$, $Y=I_B$. Then $\eA_Y$ is the $\si$-algebra generated by $B$. The random variable $\bsE(I_A|I_B)$ has a constant value $x_B$ on $B$ and a constant value $x_{\neg B}$ on $\neg B :=\Omega\setminus B$. They are determined by the equality**

*Example 1.*$$ x_B \bsP(B)= \int_B I_A(\omega)\bsP(d\omega) =\bsP(A\cap B) $$

so that

$$ x_B=\frac{\bsP(A\cap B)}{\bsP(B)}=\bsP(A|B). $$

Similarly

$$ x_{\neg B}= \bsP(A|\neg B). $$

$$ \ast\ast\ast $$

**Suppose $\bsU=\bsV=\bR$ and that $X$ and $Y$ are discrete random variables with ranges $R_X$ and $R_Y$. The random variable $\bsE(X|Y)$ has a constant value $\bsE(X|y)$ on the set $\{Y=y\}$, $y\in R_Y$. It is determined from the equality**

*Example 2.*$$ \bsE(X|Y=y)p_Y(y)=\int_{Y=y} \bsE(X|Y)_\omega \bsP_Y(\omega) =\int_{Y=y} X(\omega) d\bsP(\omega). $$

Then $\bsE(X|Y)$ can be viewed as a random variable $ (R_Y, p_Y)\to \bR$, $y\mapsto \bsE(X|Y)_y=\bsE(X|Y=y)$, where

$$ \bsE(X|Y=y) =\frac{1}{p_Y(y)}\int_{Y=y} X(\omega) d\bsP(\omega). $$

For this reason

**one should think of**$\bsE(X|Y)$ as a function of $Y$. From this point of view, a more appropriate notation would be $\bsE_X(Y)$.

The joint probability distribution $p_{X,Y}$ can be viewed as a function

$$ p_{X,Y}: R_X\times R_Y\to \bR_{\geq 0},\;\;\sum_{(x,y)\in R_X\times R_Y} p_{X,Y}(x,y)= 1. $$

Then

$$ \bsE(X|Y=y)= \sum_{x\in R_X} x\frac{P_{X,Y}(x,y)}{p_Y(y)}. $$

We introduce new $R_X$-valued random variable $(X|Y=y)$ with probability distribution $p_{X|Y=y}(x)=\frac{p_{X,Y}(x,y)}{p_Y(y)}$.

Then $\bsE(X|Y=y)$ is the expectation of the random variable $(X|Y=y)$.

$$ \ast\ast\ast $$

**Suppose that $\bsU, \bsV$ are equipped with Euclidean metrics, $X,Y$ are centered Gaussian random vectors with covariance forms $A$ and respectively $B$. Assume that the covariance pairing between $X$ and $Y$ is $C$ so that the covariance form of $(X, Y)$ is**

*Example 3.*$$ S=\left[

\begin{array}{cc} A & C\\

C^\dagger & B

\end{array}

\right]. $$

We have $\newcommand{\bsW}{\boldsymbol{W}}$

$$P_{X,Y}(dw) =\underbrace{\frac{1}{\sqrt{\det 2\pi S}} e^{-\frac{1}{2}(S^{-1}w,w)} }_{=:\gamma_S(w)}dw,\;\;w=x+y\in \bsW:=\bsU\oplus \bsV $$

$$ P_Y(dy) = \gamma_B(y) dy),\;\;\gamma_B(Y)=\frac{1}{\sqrt{\det 2\pi B}} e^{-\frac{1}{2}(B^{-1}w,w)} . $$

For any bounded measurable function $f:\bsU\to \bR$ we have

$$\int_{\Omega} f(X(\omega)) \bsP(d\omega)=\int_\bsV \bsE(f(X)|Y) \bsP_Y(d\omega)=\int_\bsV \bsE(f(X)| Y=y) dp_Y(y). $$

We deduce

$$\int_{\bsU\oplus \bsV} f(x) \gamma_S(x,y) dx dy= \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy. $$

Now observe that

$$ \int_{\bsV}\left(\int_\bsU f(x) \gamma_S(x,y) dx\right) dy = \int_\bsV \bsE(f(X)| Y=y) \gamma_B(y) dy. $$

This implies that

$$ \bsE(f(X)| Y=y) =\frac{1}{\gamma_B(y)} \left(\int_\bsU f(x) \gamma_S(x,y) dx\right). $$

We obtain a probability measure $p_{X|Y=y}$ on the affine plane $\bsU\times \{y\}$ given by

$$p_{X|Y=y}(dx)= \frac{\gamma_S(x,y)}{\gamma_B(y)} dx. $$

This is a Gaussian measure on $\bsU$. Its statistics are described by the

*regression formula.*More precisely, its mean is

$$m_{X|Y=y}= Cy, $$

and its covariance form is

$$S_{X|Y=y}= A- CB^{-1}C^\dagger. $$

$$ \ast\ast\ast $$

In general, if we think of $p_{X,Y}$ as a density on $\bsU\oplus \bsV$, of $p_Y$ as a density on $\bsV$ and we denote by $\pi_\bsV$ the natural projection $\bsU\oplus\bsV\to\bsV$, then the conditional probability distribution $p_{X|Y=y}$ is a a probability density on $\pi^{-1}_\bsV(y)$. More precisely it is the density $p_{X,Y}/\pi^*_Vp_Y$ defined as in Section 9.1.1 of my lectures, especially Proposition 9.1.8 page 350 of the lectures.