Concept#

Loss#

Definition 3 (Loss Function)

Formally, a loss function is a map [Jung, 2023]

\[\begin{split} \begin{aligned} \mathcal{L}: \mathcal{X} \times \mathcal{Y} \times \mathcal{H} &\to \mathbb{R}_{+} \\ \left((\mathbf{x}, y), h\right) &\mapsto \mathcal{L}\left(\left(\mathbf{x}, y\right), h\right) \end{aligned} \end{split}\]

which maps a pair of data point \(\mathbf{x}\) and label \(y\) together with a hypothesis \(h\) to a non-negative real number \(\mathcal{L}\left(\left(\mathbf{x}, y\right), h\right)\).

For example, if \(h\) is an element from the linear map taking on the form \(h(\mathbf{x})=\mathbf{w}^T\mathbf{x}\), then the loss is a function of the parameters \(\mathbf{w}\) of the hypothesis \(h\). This means we seek to find \(\hat{\mathbf{w}}\) that minimizes the loss function \(\mathcal{L}\left(\left(\mathbf{x}, y\right), h\right)\).

We sometimes abuse notation by writing \(\mathcal{L}\left(\left(\mathbf{x}, y\right), h\right)\) as \(\mathcal{L}\left(y, \hat{y}\right)\), where \(\hat{y}=h(\mathbf{x})\) is the predicted label of the hypothesis \(h\) for the data point \(\mathbf{x}\).

Remark 1 (Loss Function is a function of the parameters)

Important is that the loss function in machine learning is a function of the parameters \(\boldsymbol{\theta}\) of the hypothesis \(h\) and not of the data points \(\mathbf{x}\) and labels \(y\)!

Definition 4 (Loss Function as a Random Variable)

In our random variables chapter, we defined a random variable as a map of sample space to real numbers.

The definition of a loss function satisfies the requirement to be a random variable. Indeed, the loss function is usually expressed as a function of some other random variables.

Furthermore, each \(\mathcal{L}(y, \hat{y})\) is a random variable associated with a specific data point \(\mathbf{x}\) and label \(y\).

Consequently, the sample space of the loss function is \(\Omega_{\mathcal{X}} \times \Omega_{\mathcal{Y}} \times \Omega_{\mathcal{H}}\), where \(\Omega_{\mathcal{X}}\), \(\Omega_{\mathcal{Y}}\), and \(\Omega_{\mathcal{H}}\) are the sample spaces of the data points, labels, and hypotheses, respectively. Thus, by the definition of the random variable, the realization of the loss function is a non-negative real number \(\mathcal{L}\left(\left(\mathbf{x}, y\right), h\right)\), this is also the state of the random variable.

Empirical Loss#

The empirical loss is defined as:

\[\begin{split} \begin{aligned} \widehat{\mathcal{L}}: \mathcal{X} \times \mathcal{Y} \times \mathcal{H} &\to \mathbb{R}_{+} \\ \left((\mathbf{x}, y), h\right) &\mapsto \widehat{\mathcal{L}}\left(\left(\mathbf{x}, y\right), h\right) \end{aligned} \end{split}\]

which is

\[ \widehat{\mathcal{L}}\left(h \mid \mathcal{S} \right) \]

Further Readings#