Real World Examples
Contents
Real World Examples#
Utilities#
import sys
from pathlib import Path
parent_dir = str(Path().resolve().parent)
sys.path.append(parent_dir)
import random
import warnings
import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
warnings.filterwarnings("ignore", category=DeprecationWarning)
%matplotlib inline
from utils import seed_all, true_pmf, empirical_pmf
seed_all(42)
Using Seed Number 42
Call Center Simulation#
This example is taken from [Yiu, 2019], please take a look at this author’s articles, they are really good!
Imagine that we are data scientists tasked with improving the ROI (Return on Investment) of our company’s call center, where employees attempt to cold call potential customers and get them to purchase our product.
You look at some historical data and find the following:
The typical call center employee completes on average 50 calls per day.
The probability of a conversion (purchase) for each call is
.The average revenue to your company for each conversion is
.The call center you are analyzing has
employees.Each employee is paid
per day of work.
Bernoulli Trials#
Let
Then we can model the probability of success of a call made by a randomly chosen employee as
a Bernoulli random variable
We need to follow some assumptions in order to make
Each trial has two possible outcomes: 1 or 0 (success of failure);
This means any single employee can only make call that is successful or not.
The probability of success is constant for each trial, as mentioned, the probability of a conversion (purchase) for each call is
. We treat is as constant.This means everytime a call is made by a randomly chosen employee, the success rate is constant at
.
Each trial is independent.
This means that each call made by the randomly chosen employee is independent of the previous calls made.
We define the sample space, event space and probability measure below:
The sample space
where represents failure and represents success.The event space
.The probability measure
is defined as follows:
Our random variable
Consequently, the PMF is fully determined by
Binomial Distribution#
Now the above details a single call made by a randomly chosen employee, which follows a Bernoulli distribution.
You can associate this with a single coin toss made by a randomly chosen person, which also follows a Bernoulli distribution
(i.e.
Now, a single employee can make multiple calls. For example, in the example above, an employee can make
50 calls a day. We can define a sequence of calls
Since each
Since
So
Each call by 1 single employee is
Below is a realization of the PMF of
p = 0.04
n = 50
# our random variable Y
Y = stats.binom(n, p)
y = np.arange(n + 1) # [0, 1, ..., 50]
f = Y.pmf(y) # pmf P(Y=0) P(Y=1)
fig = plt.figure(figsize=(20, 5))
plt.plot(y, f, "bo", ms=10)
plt.vlines(y, 0, f, colors="b", lw=5, alpha=0.5)
plt.xticks(y)
plt.show();

for y_ in range(51):
print(f"PMF for Y={y_} is {Y.pmf(y_)}")
PMF for Y=0 is 0.12988579352203838
PMF for Y=1 is 0.2705954031709139
PMF for Y=2 is 0.27623280740364115
PMF for Y=3 is 0.18415520493576068
PMF for Y=4 is 0.09015931908313288
PMF for Y=5 is 0.03456107231520094
PMF for Y=6 is 0.010800335098500296
PMF for Y=7 is 0.002828659192464365
PMF for Y=8 is 0.0006335017983123325
PMF for Y=9 is 0.00012318090522739773
PMF for Y=10 is 2.1043404643013807e-05
PMF for Y=11 is 3.1883946428808775e-06
PMF for Y=12 is 4.317617745567845e-07
PMF for Y=13 is 5.2586369978069964e-08
PMF for Y=14 is 5.790760979727912e-09
PMF for Y=15 is 5.790760979727948e-10
PMF for Y=16 is 5.278037351314524e-11
PMF for Y=17 is 4.398364459428767e-12
PMF for Y=18 is 3.3598617398414237e-13
PMF for Y=19 is 2.3577977121694196e-14
PMF for Y=20 is 1.5227443557760843e-15
PMF for Y=21 is 9.063954498667164e-17
PMF for Y=22 is 4.978308342070981e-18
PMF for Y=23 is 2.525228869166441e-19
PMF for Y=24 is 1.1837010324217706e-20
PMF for Y=25 is 5.129371140494332e-22
PMF for Y=26 is 2.055036514621123e-23
PMF for Y=27 is 7.611246350448613e-25
PMF for Y=28 is 2.60503967351663e-26
PMF for Y=29 is 8.234320807092859e-28
PMF for Y=30 is 2.4016769020687333e-29
PMF for Y=31 is 6.456120704485886e-31
PMF for Y=32 is 1.5972173617868683e-32
PMF for Y=33 is 3.630039458606517e-34
PMF for Y=34 is 7.562582205430263e-36
PMF for Y=35 is 1.4404918486533809e-37
PMF for Y=36 is 2.500853903912116e-39
PMF for Y=37 is 3.9427876863479427e-41
PMF for Y=38 is 5.620201745890709e-43
PMF for Y=39 is 7.205386853706048e-45
PMF for Y=40 is 8.256172436538177e-47
PMF for Y=41 is 8.390419142823334e-49
PMF for Y=42 is 7.491445663235104e-51
PMF for Y=43 is 5.807322219562096e-53
PMF for Y=44 is 3.8495507137248904e-55
PMF for Y=45 is 2.138639285402722e-57
PMF for Y=46 is 9.685866328816647e-60
PMF for Y=47 is 3.434704371920785e-62
PMF for Y=48 is 8.944542635210379e-65
PMF for Y=49 is 1.5211807202738746e-67
PMF for Y=50 is 1.2676506002282307e-70
mean, var = Y.stats(moments="mv")
print(f"Expected value: {mean}")
print(f"Variance: {var}")
Expected value: 2.0
Variance: 1.92
At this very moment, we can already calculate our the total revenue earned per day and the corresponding profits per day.
This is because we can use the expected value
We see that we are making a loss of
num_employees = 100
salary_per_employee = 200
revenue_per_successful_call = 90
expected_total_conversions = mean * num_employees
expected_total_revenue = revenue_per_successful_call * expected_total_conversions
expected_total_profits = expected_total_revenue - salary_per_employee * num_employees
print(f"Expected Total Conversions Per Day: {expected_total_conversions}")
print(f"Expected Total Revenue Per Day: {expected_total_revenue}")
print(f"Expected Total Profits Per Day: {expected_total_profits}")
Expected Total Conversions Per Day: 200.0
Expected Total Revenue Per Day: 18000.0
Expected Total Profits Per Day: -2000.0
By theory, we know that increasing either
So the solutions to increase revenue/profits per day are:
Increase
, the number of calls performed by 1 single employee;Increase
, the success rate of a call conversion by 1 single employee through techniques like not hard pressing the customers to buy something;
With this realized state of the theoretical (true) PMF, we can employ generative/synthesis method to generate
samples from this distribution because we assumed that
So if we have Y.rvs(size = 10)
, it simply means generate a sequence 10 random variables
We have
In this context, we can be unambiguous and say that each
To be more precise, each experiment for
# generate states from binomial distribution
Y.rvs(size = 10)
array([1, 4, 3, 2, 1, 1, 0, 4, 2, 3])
If we plot 100 such
bins = np.arange(0, 50 + 1.5) - 0.5
fig = plt.figure(figsize=(20, 5))
plt.hist(
Y.rvs(size=100),
bins,
density=True,
color="red",
alpha=0.5,
edgecolor="black",
linewidth=2,
)
plt.plot(y, f, "bo", ms=10)
plt.vlines(y, 0, f, colors="b", lw=5, alpha=0.5)
plt.xticks(y)
plt.show()

When we increase to 10000, the empirical starts to be closer to the true PMF.
bins = np.arange(0, 50 + 1.5) - 0.5
fig = plt.figure(figsize=(20, 5))
plt.hist(
Y.rvs(size=10000),
bins,
density=True,
color="green",
alpha=0.5,
edgecolor="black",
linewidth=2,
)
plt.plot(y, f, "bo", ms=10)
plt.vlines(y, 0, f, colors="b", lw=5, alpha=0.5)
plt.xticks(y)
plt.show()

Alternative way to Calculate Expected Value of Revenue per Day#
We already know in the long run the company does not earn profits by the law of average (expectation).
We can also calculate it by defining the total revenue as a random variable
This means on average, the company earns revenue of 180 per day per employee. Thus if there’s 100 employees,
then on average, the total revenue per day is
And if we want to further calculate how much the company loses in 1 year, that will be
assuming everything else constant (no employees left etc).
We can also simulate 365 days using code below, we define Z = 90 * Y.rvs(size = 100)
to be the random variable Z=90*Y
).
At each loop (day), we get a new draw random variable
yearly_revenue = np.zeros(shape=365)
yearly_salary_paid = np.ones(shape=365) * 20000
for each_day in range(365):
Z = 90 * Y.rvs(size = 100) # 100 employees' revenue [90, 180, 0, ...]
daily_revenue = np.sum(Z)
yearly_revenue[each_day] = daily_revenue
yearly_profits = yearly_revenue - yearly_salary_paid
#bins = np.arange(0, yearly_revenue.max() + 1.5) - 0.5
fig = plt.figure(figsize=(20, 5))
plt.hist(
yearly_profits,
bins=20,
density=True,
color="red",
alpha=0.5,
edgecolor="black",
linewidth=2,
)
# plt.plot(y, f, "bo", ms=10)
# plt.vlines(y, 0, f, colors="b", lw=5, alpha=0.5)
#plt.xticks(y)
plt.show()

If we increase
p = 0.06
n = 58
# our random variable Y
Y = stats.binom(n, p)
yearly_revenue = np.zeros(shape=365)
yearly_salary_paid = np.ones(shape=365) * 20000
for each_day in range(365):
Z = 90 * Y.rvs(size = 100) # 100 employees' revenue [90, 180, 0, ...]
daily_revenue = np.sum(Z)
yearly_revenue[each_day] = daily_revenue
yearly_profits_improved = yearly_revenue - yearly_salary_paid
#bins = np.arange(0, yearly_revenue.max() + 1.5) - 0.5
fig = plt.figure(figsize=(20, 5))
plt.hist(
yearly_profits,
bins=20,
density=True,
color="red",
alpha=0.5,
edgecolor="black",
linewidth=2,
)
plt.hist(
yearly_profits_improved,
bins=20,
density=True,
color="green",
alpha=0.5,
edgecolor="black",
linewidth=2,
)
# plt.plot(y, f, "bo", ms=10)
# plt.vlines(y, 0, f, colors="b", lw=5, alpha=0.5)
#plt.xticks(y)
plt.show()
