Probability Density Function#

Definition#

As mentioned in Definition 40, a continuous random variable \(X\) is defined by its probability density function (PDF) \(f(x)\) or its cumulative distribution function (CDF) \(F(x)\).

Definition 41 (Probability Density Function)

The probability density function (PDF) of a random variable \(X\) is a mapping \(f_X(x)\)

\[\begin{split} \begin{align} \pdf: X(\S) &\to \R \\ X(\S) \ni x &\mapsto \pdf(x) \end{align} \end{split}\]

which satisfies the following properties [Chan, 2021]:

  • Non-negativity: \(\pdf(x) \geq 0\) for all \(x \in \S\).

  • Unity: \(\int_{\S} \pdf(x) \ dx = 1\).

  • Measure of a set: \(\P \lsq \lset x \in A \rset \rsq = \int_A \pdf(x) \ dx\) for all \(A \subseteq \S\).

Remark 1 (Probability Density Function)

The probability density function \(f(x)\) of a continuous random variable is similar to the probability mass function \(p(x)\) of a discrete random variable, but with two key differences:

  1. Unlike the PMF \(\pmf(x)\), the PDF \(\pdf(x)\) is not a probability. The PDF \(\pdf(x)\) is a density, which is a measure of the probability of a random variable \(X\) taking on a value \(x\). The higher the density, the more likely it is that \(X\) takes on the value \(x\) (or a value close to \(x\)).

  2. This means that the PDF \(\pdf(x)\) is not necessarily bounded and can be greater than 1.

Notice that the definition of PDF above did not “explicitly” mention the probability of a random variable \(X\), instead it just mentions that the measure of a set to be \(\P \lsq \lset x \in A \rset \rsq = \int_A \pdf(x) \ dx\).

The author further mentioned if we are dealing with 1-dimensional data (on the real line), then we can then give a more intuitive definition of the PDF.

Definition 42 (Probability Density Function (1-dimensional))

Let \(X\) be a continuous random variable on the real line \(\R\). The probability density function (PDF) of \(X\) is a mapping \(f_X(x)\)

\[\begin{split} \begin{align} \pdf: X(\S) &\to \R \\ X(\S) \ni x &\mapsto \pdf(x) \end{align} \end{split}\]

that when when integrated over an interval \([a, b] \subseteq \R\), yields the probability of observing a value of \(X\) in the interval \([a, b]\) (i.e \(a \leq X \leq b\)):

\[ \P \lsq a \leq X \leq b \rsq = \int_a^b \pdf(x) \ dx \]

Notice that we have replaced \(\int_{A}\) with \(\int_a^b\) where \(A = [a, b]\).

Definition 43 (Zero Measure)

The probability of a continuous random variable \(X\) taking on a value \(x\) is zero:

\[ \P \lsq X = x \rsq = 0 \]

Remark 2 (Open Equals Closed Interval)

By Definition 43, all isolated points have zero measure in the continuous space and therefore the probability of an open interval \((a, b)\) is equivalent to the probability of a closed interval \([a, b]\).

More concretely, let \(\P \lsq [a, b] \rsq = \P \lsq a \leq X \leq b \rsq\), then

\[ \begin{align} \P \lsq [a, b] \rsq = \P \lsq (a, b) \rsq = \P \lsq (a, b] \rsq = \P \lsq [a, b) \rsq \end{align} \]

This may not hold when the PDF of \(\pdf(x)\) has a delta function at \(a\) or \(b\) [Chan, 2021].

Further Readings#

  • Chan, Stanley H. “Chapter 4.1. Probability Density Function.” In Introduction to Probability for Data Science, 172-180. Ann Arbor, Michigan: Michigan Publishing Services, 2021.