Probability#

Basics of Probability#

Probability theory deals with the likelihood of events occurring in a defined sample space. Some key foundational concepts include:

Sample Spaces and Events#

Sample Space ($\Omega$): The set of all possible outcomes of an experiment. For example, for a coin flip, $\Omega = \{\text{heads}, \text{tails}\}$.
Event ($A$): A subset of the sample space. For example, the event of getting heads when flipping a coin is $A = \{\text{heads}\}$.

Set Operations#

Union ($A \cup B$): The event that either $A$ or $B$ (or both) occurs.
Intersection ($A \cap B$): The event that both $A$ and $B$ occur.
Complement ($A^c$): The event that $A$ does not occur.

Basic Probability Rules#

Addition Rule: For any two events $A$ and $B$,

\[ P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

Multiplication Rule: For two events $A$ and $B$,

\[ P(A \cap B) = P(A|B)P(B) = P(B|A)P(A) \]

Conditional Probability#

Given two events $A, B \in \Omega$, the conditional probability of $A$ given $B$ is:

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \]

Independent Events#

Two events $A$ and $B$ are said to be independent if the occurrence of one does not affect the probability of the occurrence of the other. Formally, $A$ and $B$ are independent ($A \perp B$) if:

\[ P(A \cap B) = P(A)P(B) \]

Equivalently, $A$ and $B$ are independent if:

\[ P(A|B) = P(A) \quad \text{and} \quad P(B|A) = P(B) \]

Conditional Independence: Events $A$ and $B$ are independent given $C$ if:

\[ P(A, B|C) = P(A|C) P(B|C) \]

Key Theorems#

Law of Total Probability#

Let $B_1, B_2, \ldots, B_n$ be events that partition the sample space. Then:

\[ P(A) = \sum_{i=1}^n P(A|B_i) P(B_i) \]

Bayes’ Theorem#

Given events $A$ and $B$:

\[ P(A|B) = \frac{P(B|A) P(A)}{P(B)} \]

For events conditioned on $C$:

\[ P(A|B, C) = \frac{P(B|A, C) P(A|C)}{P(B|C)} \]

Random Variables#

Probability consists of events and their corresponding likelihood of occurring. We often assign values to events, defining a function from events to real numbers. This function is called a random variable.

Discrete and Continuous Random Variables#

Discrete Random Variables: These take on a countable number of distinct values. For example, the number of heads in 10 coin flips is a discrete random variable.
Continuous Random Variables: These take on an uncountable number of possible values, often within a range. For example, the exact height of a person is a continuous random variable.

For example, consider a game where if a coin lands heads, you win $5, and if it lands tails, you lose $10. The random variable $X$ can be defined as:

\[\begin{split} X(\omega) = \begin{cases} 5 & \text{if } \omega \text{ is heads} \\ -10 & \text{if } \omega \text{ is tails} \end{cases} \end{split}\]

Here, $\omega$ represents the outcome of the coin flip. To analyze such scenarios, we use the concept of expected value.

Expected Value#

The expected value (or mean, expectation, average) is the weighted average of all possible values of a random variable. For a discrete random variable $X : \Omega \to \mathbb{R}^d$, the expected value is calculated as:

\[ E(X) = \sum_{\omega \in \Omega} X(\omega) P(\omega) \]

Variance#

Variance measures how far values of a random variable are spread from the mean:

\[ Var(X) = E\left[(X - E(X))^2\right] = E(X^2) - (E(X))^2 \]

The standard deviation is the square root of the variance:

\[ SD(X) = \sqrt{Var(X)} \]

Covariance#

Covariance measures the relationship between two random variables $X$ and $Y$:

\[ Cov(X, Y) = E\left[(X - E(X))(Y - E(Y))\right] = E(XY) - E(X)E(Y) \]

Correlation#

Correlation is a normalized measure of the relationship between two random variables:

\[ Corr(X, Y) = \frac{Cov(X, Y)}{\sqrt{Var(X) Var(Y)}} \]

Two variables are said to be uncorrelated if their correlation is zero, i.e., $Corr(X, Y) = 0$. This implies that there is no linear relationship between $X$ and $Y$, though they may still be dependent in a non-linear fashion. The correlation coefficient ranges from -1 to 1, and assumes that the variance for either variable is non-zero. Generally, the correlation descibes the linear dependence between two random variables, but it does not imply causation.

Joint Random Variables#

When dealing with multiple random variables simultaneously, we consider their joint behavior.

Joint Probability Mass Function (PMF)#

For discrete random variables $X$ and $Y$, the joint PMF is defined as:

\[ P(X = x, Y = y) \]

This gives the probability that $X$ takes on the value $x$ and $Y$ takes on the value $y$ simultaneously.

Marginal Probability Mass Function (PMF)#

The marginal PMF of $X$ can be obtained by summing the joint PMF over all possible values of $Y$:

\[ P(X = x) = \sum_y P(X = x, Y = y) \]

Similarly, for $Y$:

\[ P(Y = y) = \sum_x P(X = x, Y = y) \]

Probability Distributions#

PDFs and CDFs#

In continuous probability spaces, the probability of a specific event is zero. Instead, we examine ranges using the Probability Density Function (PDF) and Cumulative Distribution Function (CDF).

CDF: $F(x) = P(X \leq x)$
PDF: $f(x) = \frac{d}{dx} F(x)$

By the Fundamental Theorem of Calculus:

\[ F(x) = \int_{-\infty}^x f(t) \, dt \]

The probability that a random variable falls within a range is:

\[ P(a \leq X \leq b) = F(b) - F(a) = \int_a^b f(x) \, dx \]

Joint and Marginal Densities#

For continuous random variables $X$ and $Y$:

Joint Density: The joint probability density function (PDF) is denoted by $f_{X,Y}(x, y)$ and describes the likelihood of $X = x$ and $Y = y$ occurring simultaneously.
Marginal Density: The marginal PDF of $X$ is obtained by integrating the joint PDF over all possible values of $Y$:

\[ f_X(x) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \, dy \]

Similarly, for $Y$:

\[ f_Y(y) = \int_{-\infty}^{\infty} f_{X,Y}(x, y) \, dx \]

Joint Continuous Random Variables#

When two continuous random variables are considered together, they can be described using a joint PDF. For example, if $X$ and $Y$ are jointly continuous random variables, the joint PDF $f_{X,Y}(x, y)$ must satisfy:

\[ \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} f_{X,Y}(x, y) \, dx \, dy = 1 \]

Expected Value (Continuous)#

The expected value for a continuous random variable is:

\[ E(X) = \int_{-\infty}^{\infty} x f(x) \, dx \]

For a function $g(X)$:

\[ E(g(X)) = \int_{-\infty}^{\infty} g(x) f(x) \, dx \]

Here’s the updated Common Distributions table with the support of the random variables added:

Common Distributions#

Distribution	Notation	PDF / PMF	Parameters	Support	Interpretation
Uniform (Discrete)	$X \sim \text{Unif}(a, b)$	$P(X = x) = \frac{1}{b - a + 1}$	$a$: lower bound, $b$: upper bound	$x \in \{a, \dots, b\}$	Equal probability for each outcome between $a$ and $b$
Uniform (Continuous)	$X \sim \text{Unif}(a, b)$	$f(x) = \frac{1}{b - a}$ for $a \leq x \leq b$	$a$: lower bound, $b$: upper bound	$x \in [a, b]$	Models complete uncertainty over a fixed interval
Bernoulli	$X \sim \text{Bern}(p)$	$P(X = 1) = p$, $P(X = 0) = 1 - p$	$p$: probability of success	$x \in \{0, 1\}$	One trial with two outcomes (success/failure)
Binomial	$X \sim \text{Bin}(n, p)$	$P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}$	$n$: trials, $p$: success prob.	$k \in \{0, \dots, n\}$	Number of successes in $n$ independent trials
Normal (Gaussian)	$X \sim \mathcal{N}(\mu, \sigma^2)$	$f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}$	$\mu$: mean, $\sigma^2$: variance	$x \in \mathbb{R}$	Models natural phenomena with bell-curve symmetry
Poisson	$X \sim \text{Pois}(\lambda)$	$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$	$\lambda$: rate	$k \in \{0, 1, 2, \dots\}$	Count of events in fixed time/space with constant rate
Exponential	$X \sim \text{Exp}(\lambda)$	$f(x) = \lambda e^{-\lambda x}$ for $x \geq 0$	$\lambda$: rate	$x \in [0, \infty)$	Time between events in a Poisson process
Geometric	$X \sim \text{Geom}(p)$	$P(X = k) = (1 - p)^{k - 1} p$	$p$: probability of success	$k \in \{1, 2, \dots\}$	Number of trials until first success
Beta	$X \sim \text{Beta}(\alpha, \beta)$	$f(x) = \frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{B(\alpha, \beta)}$	$\alpha$, $\beta$: shape parameters	$x \in [0, 1]$	Models probabilities or proportions (e.g., Bayesian priors)
Gamma	$X \sim \text{Gamma}(k, \theta)$	$f(x) = \frac{x^{k-1} e^{-x/\theta}}{\theta^k \Gamma(k)}$	$k$: shape, $\theta$: scale	$x \in [0, \infty)$	General model for wait times; sum of $k$ exponentials

Conditional Probability and Expectation#

Conditional Density#

The conditional density function $f_{X|Y}(x|y)$ is defined as:

\[ f_{X|Y}(x|y) = \frac{f_{X,Y}(x, y)}{f_Y(y)} \]

where $f_{X,Y}(x, y)$ is the joint density of $X$ and $Y$, and $f_Y(y)$ is the marginal density of $Y$.

For continuous random variables $X$ and $Y$ with joint PDF $f_{X,Y}(x, y)$, the conditional probability density of $X$ given $Y = y$ is:

\[ P(X \leq x | Y = y) = \int_{-\infty}^{x} f_{X|Y}(t|y) \, dt \]

Conditional Expectation#

The conditional expectation of a random variable $X$ given an event $A$ is:

\[ E(X|A) = \sum_{\omega \in \Omega} X(\omega) P(\omega|A) \]

For continuous random variables:

\[ E(X|Y=y) = \int_{-\infty}^{\infty} x f_{X|Y}(x|y) \, dx \]

Here’s the section to add about the Independence of Random Variables and Conditional Independence of Random Variables:

Independence and Conditional Independence of Random Variables#

Independence of Random Variables:
Two random variables X and Y are said to be independent if the occurrence of any specific value of X does not affect the probability distribution of Y , and vice versa. Formally, X and Y are independent ($X \perp Y$) if for all x and y:

\[ P(X = x, Y = y) = P(X = x)P(Y = y) \]

For continuous random variables, independence is defined using the joint and marginal probability density functions (PDFs):

\[ f_{X,Y}(x, y) = f_X(x)f_Y(y) \]

This means the joint distribution factorizes into the product of the marginal distributions when the variables are independent.

Conditional Independence of Random Variables:
Two random variables X and Y are conditionally independent given a third variable Z if the conditional distribution of X given Y and Z is the same as the conditional distribution of X given Z alone. Formally, X and Y are conditionally independent given Z ($X \perp Y | Z$) if:

\[ P(X = x, Y = y | Z = z) = P(X = x | Z = z) P(Y = y | Z = z) \]

For continuous random variables, this becomes:

\[ f_{X,Y|Z}(x, y | z) = f_{X|Z}(x | z) f_{Y|Z}(y | z) \]

Conditional independence is a crucial concept in probabilistic graphical models and machine learning, simplifying the analysis of complex systems by breaking down dependencies.

Useful Properties#

For random variables $X, Y, Z$, constants $a, b, c \in \mathbb{R}$, and function $g: \mathbb{R}^d \to \mathbb{R}$:

Linearity of Expectation:

\[ E(aX + bY + c) = aE(X) + bE(Y) + c \]

Independence and Expectation:

\[ E(X|Y) = E(X) \quad \text{if } X \perp Y \]

Law of Total Expectation:

\[ E(E(X|Y)) = E(X) \]

Variance Properties:

\[ Var(aX + bY + c) = a^2 Var(X) + b^2 Var(Y) + 2ab Cov(X, Y) \]

Covariance Properties:

\[ Cov(X, Y) = Cov(Y, X) \]

\[ Cov(X + a, Y + b) = Cov(X, Y) \]

\[ Cov(aX, bY) = ab Cov(X, Y) \]

\[ Cov(X + W, Y + Z) = Cov(X, Y) + Cov(X, Z) + Cov(W, Y) + Cov(W, Z) \]

Correlation Properties:

\[ Corr(aX + b, cY + d) = Corr(X, Y) \]

Distribution	Notation	PDF / PMF	Parameters	Support	Interpretation
Uniform (Discrete)	\(X \sim \text{Unif}(a, b)\)	\(P(X = x) = \frac{1}{b - a + 1}\)	\(a\): lower bound, \(b\): upper bound	\(x \in \{a, \dots, b\}\)	Equal probability for each outcome between \(a\) and \(b\)
Uniform (Continuous)	\(X \sim \text{Unif}(a, b)\)	\(f(x) = \frac{1}{b - a}\) for \(a \leq x \leq b\)	\(a\): lower bound, \(b\): upper bound	\(x \in [a, b]\)	Models complete uncertainty over a fixed interval
Bernoulli	\(X \sim \text{Bern}(p)\)	\(P(X = 1) = p\), \(P(X = 0) = 1 - p\)	\(p\): probability of success	\(x \in \{0, 1\}\)	One trial with two outcomes (success/failure)
Binomial	\(X \sim \text{Bin}(n, p)\)	\(P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}\)	\(n\): trials, \(p\): success prob.	\(k \in \{0, \dots, n\}\)	Number of successes in \(n\) independent trials
Normal (Gaussian)	\(X \sim \mathcal{N}(\mu, \sigma^2)\)	\(f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}\)	\(\mu\): mean, \(\sigma^2\): variance	\(x \in \mathbb{R}\)	Models natural phenomena with bell-curve symmetry
Poisson	\(X \sim \text{Pois}(\lambda)\)	\(P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}\)	\(\lambda\): rate	\(k \in \{0, 1, 2, \dots\}\)	Count of events in fixed time/space with constant rate
Exponential	\(X \sim \text{Exp}(\lambda)\)	\(f(x) = \lambda e^{-\lambda x}\) for \(x \geq 0\)	\(\lambda\): rate	\(x \in [0, \infty)\)	Time between events in a Poisson process
Geometric	\(X \sim \text{Geom}(p)\)	\(P(X = k) = (1 - p)^{k - 1} p\)	\(p\): probability of success	\(k \in \{1, 2, \dots\}\)	Number of trials until first success
Beta	\(X \sim \text{Beta}(\alpha, \beta)\)	\(f(x) = \frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{B(\alpha, \beta)}\)	\(\alpha\), \(\beta\): shape parameters	\(x \in [0, 1]\)	Models probabilities or proportions (e.g., Bayesian priors)
Gamma	\(X \sim \text{Gamma}(k, \theta)\)	\(f(x) = \frac{x^{k-1} e^{-x/\theta}}{\theta^k \Gamma(k)}\)	\(k\): shape, \(\theta\): scale	\(x \in [0, \infty)\)	General model for wait times; sum of \(k\) exponentials

Probability

Contents