Concept
Contents
Concept#
Random Vectors#
In earlier sections we see that if \(X\) and \(Y\) are two random variables, then their joint distribution can be represented as \(f_{X,Y}(x,y)\). Similarly, if \(X_1, X_2, \ldots, X_N\) are \(N\) random variables, then their joint distribution can be represented as
where \(x_1, x_2, \ldots, x_N\) are the realizations of the random variables \(X_1, X_2, \ldots, X_N\) respectively.
The notation is cumbersome, and when dealing with high-dimensional data, we often package them in vectors/matrices.
(Random Vectors)
Let \(X_1, X_2, \ldots, X_N\) be \(N\) random variables. Then we denote the random vector \(\boldsymbol{X}\) as follows:
where \(\boldsymbol{x}\) is the realization of \(\boldsymbol{X}\).
The sample space of \(\boldsymbol{X}\) is usually denoted \(\mathcal{X} = \mathbb{R}^N\).
Note that the bolded symbol represents a vector, this is not to be confused with the design matrix \(\mathbf{X}\). To avoid confusion, we may redefine the design matrix to be \(\mathbf{A}\) or \(\mathbf{M}\).
(Random Vectors)
Linking back ImageNet example in From Single Variable to Joint Distributions, we can treat each image as a random vector \(\boldsymbol{X}\), where the sample space is \(\mathcal{X} = \mathbb{R}^{3 \times 224 \times 224}= \mathbb{R}^{150528}\). We now have an image representing a Ferrari, we want to ask the question: what is the probability of drawing a Ferrari from the sample space? This is equivalent to asking: what is the probability of drawing \(x_1\), \(x_2\), \(x_3\), \(\ldots\), \(x_{150528}\) simultaneously from the sample space \(\mathcal{X}\)?
PDF of Random Vectors#
(PDF of Random Vectors)
Let \(\boldsymbol{X} = \begin{bmatrix} X_1 & X_2 & \cdots & X_N \end{bmatrix}^{\intercal}_{N \times 1}\) be a random vector with sample space \(\mathcal{X}\).
Let \(\eventA\) be an event in \(\mathcal{X}\). Then the probability of \(\eventA\) is defined as
where \(f_{\boldsymbol{X}}(\boldsymbol{x})\) is the PDF of \(\boldsymbol{X}\).
(Marginal PDF of Random Vectors)
Continuing from Definition 82, we can find the marginal PDF of \(X_n\) by integrating out the other random variables:
where \(f_{X_n}\) is the marginal PDF of \(X_n\).
(Joint CDF of Random Vectors)
Continuing from Definition 82, we can find the joint CDF as
where \(F_{X_1, X_2, \ldots, X_N}(x_1, x_2, \ldots, x_N)\) is the joint CDF of \(X_1, X_2, \ldots, X_N\).
Independence#
Integration gets difficult in high dimensions, one simplification is if the \(N\) random variables are independent, then the joint PDF can be written as the product of the marginal PDFs.
(Independent Random Vectors)
Let \(\boldsymbol{X} = \begin{bmatrix} X_1 & X_2 & \cdots & X_N \end{bmatrix}^{\intercal}_{N \times 1}\) be a random vector with sample space \(\mathcal{X} = \mathbb{R}^N\).
Then \(\boldsymbol{X}\) is said to be independent if
where \(f_{X_n}\) is the marginal PDF of \(X_n\).
(PDF of Independent Random Vectors)
Let \(\boldsymbol{X} = \begin{bmatrix} X_1 & X_2 & \cdots & X_N \end{bmatrix}^{\intercal}_{N \times 1}\) be an independent random vector with sample space \(\mathcal{X} = \mathbb{R}^N\).
Then the PDF of \(\boldsymbol{X}\) can be written as a product of \(N\) individual marginal PDFs:
and therefore, given \(\eventA\), the probability of \(\eventA\) is
where \(\eventA_n\) is the projection of \(\eventA\) onto the \(n\)th axis.
The last equation \((a)\) only holds if we further assume \(\eventA\) is separable [Chan, 2021], (i.e. \(\eventA = \lsq a_1, b_1 \rsq \times \lsq a_2, b_2 \rsq \times \cdots \times \lsq a_N, b_N \rsq\)), then the probability of \(\eventA\) is
(Joint Expectation of Independent Random Vectors)
Let \(\boldsymbol{X} = \begin{bmatrix} X_1 & X_2 & \cdots & X_N \end{bmatrix}^{\intercal}_{N \times 1}\) be an independent random vector with sample space \(\mathcal{X} = \mathbb{R}^N\).
Then the joint expectation of \(\boldsymbol{X}\) is
Due to the importance of \(\iid\), we will restate it again here:
(Independent and Identically Distributed (IID))
Let \(\boldsymbol{X} = \begin{bmatrix} X_1 & X_2 & \cdots & X_N \end{bmatrix}^{\intercal}_{N \times 1}\) be a random vector with sample space \(\mathcal{X} = \mathbb{R}^N\).
Then \(X_1, X_2, \ldots, X_N\) are said to be independent and identically distributed (i.i.d.) if the following two conditions hold:
The random variables are independent of each other. That is, \(\P \lsq X_i = x_i | X_j = x_j, j \neq i \rsq = \P \lsq X_i = x_i \rsq\) for all \(i, j\).
The random variables have the same distribution. That is, \(\P \lsq X_1 = x \rsq = \P \lsq X_2 = x \rsq = \ldots = \P \lsq X_N = x \rsq\) for all \(x\).
Expectation of Random Vectors#
We now define the expectation of a random vector \(\boldsymbol{X}\). Note that this is not the same as Definition 87, where we are dealing with the joint expectation of a random vector, which returns a scalar.
(Expectation of Random Vectors)
Let \(\boldsymbol{X} = \begin{bmatrix} X_1 & X_2 & \cdots & X_N \end{bmatrix}^{\intercal}_{N \times 1}\) be a random vector with sample space \(\mathcal{X} = \mathbb{R}^N\).
Then the expectation of \(\boldsymbol{X}\) is defined as
where \(\mu_n\) is the expectation of \(X_n\).
We also call this the mean vector.
The below are from [Chan, 2021].
Since the mean vector is a vector of individual elements, we need to compute the marginal PDFs before computing the expectations:
where the marginal \(\mathrm{PDF}\) is determined by
Note that \(\boldsymbol{x}_{\backslash n}\) is a vector of all the elements without \(x_n\).
In the equation above, \(\boldsymbol{x}_{\backslash n}=\left[x_1, \ldots, x_{n-1}, x_{n+1}, \ldots, x_N\right]^T\) contains all the elements without \(x_n\). For example, if the PDF is \(f_{X_1, X_2, X_3}\left(x_1, x_2, x_3\right)\), then
Again, this will become tedious when there are many variables.
Covariance of Random Vectors (Covariance Matrix)#
We now define the covariance of a random vector \(\boldsymbol{X}\).
(Covariance Matrix)
Let \(\boldsymbol{X} = \begin{bmatrix} X_1 & X_2 & \cdots & X_N \end{bmatrix}^{\intercal}_{N \times 1}\) be a random vector with sample space \(\mathcal{X} = \mathbb{R}^N\).
Then the covariance matrix of \(\boldsymbol{X}\) is defined as
Another compact way of defining it is
where \(\boldsymbol{\mu}=\mathbb{E}[\boldsymbol{X}]\) is the mean vector. The notation \(\boldsymbol{a} \boldsymbol{b}^T\) means the outer product [Chan, 2021] defined as
(Covariance Matrix of Independent Random Variables)
If the random variables \(X_1, \ldots, X_N\) of the random vector \(\boldsymbol{X}\) are independent, then the covariance matrix \(\operatorname{Cov}(\boldsymbol{X})=\boldsymbol{\Sigma}\) is a diagonal matrix:
This is in line with the fact that the covariance of two independent random variables is zero (Property 23).