Application: Plots and Transformations#

from scipy.stats import multivariate_normal
import matplotlib.pyplot as plt
import numpy as np 

import sys
from pathlib import Path
parent_dir = str(Path().resolve().parents[2])
print(parent_dir)
sys.path.append(parent_dir)

from src.utils.plot import plot_scatter, plot_contour, plot_surface, make_meshgrid
from src.utils.general import seed_all

seed_all(42)
/home/runner/work/gaohn-probability-stats/gaohn-probability-stats
Using Seed Number 42

2D Multivariate Gaussian#

We first visualize some 2D multivariate Gaussian distributions with different mean vectors and covariance matrices.

fig, axes = plt.subplots(1, 3, figsize=(12, 4), sharex=True, sharey=True)
X1 = multivariate_normal.rvs(mean=[0, 2], cov=[[5, 0], [0, 0.5]], size=200)
X2 = multivariate_normal.rvs(mean=[1, 2], cov=[[1, -0.5], [-0.5, 1]], size=200)
X3 = multivariate_normal.rvs(mean=[0, 0], cov=[[2, 1.9], [1.9, 2]], size=200)

scatter_1 = plot_scatter(axes[0], X1[:, 0], X1[:, 1], color="red")
scatter_2 = plot_scatter(axes[1], X2[:, 0], X2[:, 1], color="blue")
scatter_3 = plot_scatter(axes[2], X3[:, 0], X3[:, 1], color="green")

axes[0].set_xlabel("$\mu = [0, 2], \Sigma = [[5, 0], [0, 0.5]]$")
axes[1].set_xlabel("$\mu = [1, 2], \Sigma = [[1, -0.5], [-0.5, 1]]$")
axes[2].set_xlabel("$\mu = [0, 0], \Sigma = [[2, 1.9], [1.9, 2]]$")

plt.suptitle("2D Multivariate Gaussian Distributions with different means and covariances.")

plt.show()
../../_images/6927fd849d709b338af2f5cb0c0caa8416bfcc5d8b411144c98abf46151bd16b.png

Notice that the mean vector is the center of the distribution, and the covariance matrix determines the shape of the distribution.

For example, in the first plot, the covariance matrix is diagonal, which means that the distribution is elongated along the \(x\)-axis and \(y\)-axis. And in this case since covariance matrix is

\[\begin{split} \boldsymbol{\Sigma} = \begin{bmatrix} 5 & 0 \\ 0 & 0.5 \end{bmatrix} \end{split}\]

then this means the data is more spread out along the \(x\)-axis than the \(y\)-axis. Furthermore, since the covariance matrix is diagonal, the data is uncorrelated, which means that the \(x\)-axis and \(y\)-axis are independent of each other. Indeed you see that there is little trend in the data.

For the second plot, the covariance matrix is not diagonal,

\[\begin{split} \boldsymbol{\Sigma} = \begin{bmatrix} 1 & -0.5 \\ -0.5 & 1 \end{bmatrix} \end{split}\]

indicating that the covariance of the \(x\)-axis and \(y\)-axis is \(-0.5\),

\[ \operatorname{Cov}[X_1, X_2] = -0.5 \]

and indeed you can kind of fit a line with negative slope of \(-0.5\) to the data.

The same is true for the third plot, except that the covariance between the \(x\)-axis and \(y\)-axis is positive.

Contour Plots#

Contour plots have been mentioned in the chapter on Contours.

# Parameters to set
mu_x = 0
variance_x = 3

mu_y = 0
variance_y = 15

mean_vector = np.array([mu_x, mu_y])
covariance_matrix = np.array([[variance_x, 0], [0, variance_y]])  # independence
low, high, num_samples = -10, 10, 100
# Create meshgrid
X, Y = make_meshgrid(low, high, num_samples)
X_input_space = np.array([X.ravel(), Y.ravel()]).T  # N x 2 matrix

rv = multivariate_normal(mean_vector, covariance_matrix)
Z = rv.pdf(X_input_space)
Z = Z.reshape(X.shape)
# Make a 3D plot
# set up a figure twice as wide as it is tall
fig = plt.figure(figsize=(12, 4))

ax = fig.add_subplot(1, 2, 1)
contour = plot_contour(ax, X, Y, Z, cmap="viridis")
fig.colorbar(contour, ax=ax)
plt.scatter(mean_vector[0], mean_vector[1], marker="x", color="red", label="mean (0, 0)")
plt.vlines(0, -7.5, 7.5, linestyles="dashed", colors="black", label="variance in X2")
plt.hlines(0, -2.5, 2.5, linestyles="dashed", colors="black", label="variance in X1")
plt.clabel(contour, contour.levels, inline=True, fmt="%1.3f", fontsize=8)
plt.xlabel("X1")
plt.ylabel("X2")
plt.legend()

ax = fig.add_subplot(1, 2, 2, projection="3d")
surface = plot_surface(ax, X, Y, Z, cmap="viridis", linewidth=0)
plt.show()
../../_images/a56ffd4733f933c3db5edcc748bec8b9e4f09edadf62116a787a1592ea546630.png

Now if the covariance matrix is not diagonal, then the contour plot will be a slanted ellipse.

Linear Transformation of Mean and Covariance#

Let’s quote the example given in [Chan, 2021], section 5.7.1.

Suppose we have a random vector \(\boldsymbol{X}\) such that

\[\begin{split} \boldsymbol{X}= \begin{bmatrix} X_1 \\ \vdots \\ X_N \end{bmatrix}_{N \times 1} \end{split}\]

and \(\boldsymbol{X}\) may not be Gaussian. However, its mean vector and covariance are still well-defined as \(\boldsymbol{\mu}_{\boldsymbol{X}}\) and \(\boldsymbol{\Sigma}_{\boldsymbol{X}}\).

Let \(T: \mathbb{R}^N \rightarrow \mathbb{R}^N\) be a linear transformation, and let \(\boldsymbol{Y}=T(\boldsymbol{X})\). In other words, let \(\boldsymbol{A} \in \mathbb{R}^{N \times N}\), a \(N \times N\) matrix, then our \(T\) corresponds to the linear transformation of \(\boldsymbol{X}\) by \(\boldsymbol{A}\), and \(\boldsymbol{Y}=\boldsymbol{A} \boldsymbol{X}\). That is,

\[\begin{split} \boldsymbol{Y}=\left[\begin{array}{c} Y_1 \\ Y_2 \\ \vdots \\ Y_N \end{array}\right]=\left[\begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1 N} \\ a_{21} & a_{22} & \cdots & a_{2 N} \\ \vdots & \vdots & \ddots & \vdots \\ a_{N 1} & a_{N 2} & \cdots & a_{N N} \end{array}\right]\left[\begin{array}{c} X_1 \\ X_2 \\ \vdots \\ X_N \end{array}\right]=\boldsymbol{A} \boldsymbol{X} . \end{split}\]

Then we can show the following result.

Theorem 34 (Mean and Covariance of Linear Transformation)

Let \(\boldsymbol{X}\) be a random vector with mean vector \(\boldsymbol{\mu}_{\boldsymbol{X}}\) and covariance matrix \(\boldsymbol{\Sigma}_{\boldsymbol{X}}\). Let \(\boldsymbol{A} \in \mathbb{R}^{N \times N}\) be a \(N \times N\) matrix, and let \(\boldsymbol{Y}=\boldsymbol{A} \boldsymbol{X}\). Then

(39)#\[ \boldsymbol{\mu}_{\boldsymbol{Y}} = \boldsymbol{A} \boldsymbol{\mu}_{\boldsymbol{X}} \quad \text{and} \quad \boldsymbol{\Sigma}_{\boldsymbol{Y}} = \boldsymbol{A} \boldsymbol{\Sigma}_{\boldsymbol{X}} \boldsymbol{A}^T \]

Theorem 35 (Mean and Covariance of Shifting)

Let \(\boldsymbol{X}\) be a random vector with mean vector \(\boldsymbol{\mu}_{\boldsymbol{X}}\) and covariance matrix \(\boldsymbol{\Sigma}_{\boldsymbol{X}}\). Let \(\boldsymbol{b} \in \mathbb{R}^N\) be a \(N \times 1\) vector, and let \(\boldsymbol{Y}=\boldsymbol{X} + \boldsymbol{b}\). Then

(40)#\[ \boldsymbol{\mu}_{\boldsymbol{Y}} = \boldsymbol{\mu}_{\boldsymbol{X}} + \boldsymbol{b} \quad \text{and} \quad \boldsymbol{\Sigma}_{\boldsymbol{Y}} = \boldsymbol{\Sigma}_{\boldsymbol{X}} \]

It follows that if \(\boldsymbol{X}\) is indeed a Gaussian random vector, then \(\boldsymbol{Y}\) is also a Gaussian random vector. Furthermore, the mean vector and covariance matrix of \(\boldsymbol{Y}\) are linear transformations of the mean vector and covariance matrix of \(\boldsymbol{X}\).

This means that geometrically, if we shift \(\boldsymbol{X}\) by \(\boldsymbol{b}\), then the mean vector of \(\boldsymbol{Y}\) is shifted by \(\boldsymbol{b}\), and the covariance matrix of \(\boldsymbol{Y}\) is the same as the covariance matrix of \(\boldsymbol{X}\). This means that only the mean vector is shifted, and the shape of the distribution is not changed.

If we do a linear transformation of \(\boldsymbol{X}\) by \(\boldsymbol{A}\), then the mean vector of \(\boldsymbol{Y}\) is a linear transformation of the mean vector of \(\boldsymbol{X}\), and the covariance matrix of \(\boldsymbol{Y}\) is a linear transformation of the covariance matrix of \(\boldsymbol{X}\). This means our Gaussian distribution is stretched and rotated.

../../_images/chan_fig5.17.png

Fig. 11 Transforming a Gaussian distribution. The left is translation by a vector \(\boldsymbol{b}\), and the right is a linear transformation by a matrix \(\boldsymbol{A}\). Image Credit: [Chan, 2021].#