Probability#
Basics of Probability#
Probability theory deals with the likelihood of events occurring in a defined sample space. Some key foundational concepts include:
Sample Spaces and Events#
Sample Space (\(\Omega\)): The set of all possible outcomes of an experiment. For example, for a coin flip, \(\Omega = \{\text{heads}, \text{tails}\}\).
Event (\(A\)): A subset of the sample space. For example, the event of getting heads when flipping a coin is \(A = \{\text{heads}\}\).
Set Operations#
Union (\(A \cup B\)): The event that either \(A\) or \(B\) (or both) occurs.
Intersection (\(A \cap B\)): The event that both \(A\) and \(B\) occur.
Complement (\(A^c\)): The event that \(A\) does not occur.
Basic Probability Rules#
Addition Rule: For any two events \(A\) and \(B\),
Multiplication Rule: For two events \(A\) and \(B\),
Conditional Probability#
Given two events \(A, B \in \Omega\), the conditional probability of \(A\) given \(B\) is:
Independent Events#
Two events \(A\) and \(B\) are said to be independent if the occurrence of one does not affect the probability of the occurrence of the other. Formally, \(A\) and \(B\) are independent (\(A \perp B\)) if:
Equivalently, \(A\) and \(B\) are independent if:
Conditional Independence: Events \(A\) and \(B\) are independent given \(C\) if:
Key Theorems#
Law of Total Probability#
Let \(B_1, B_2, \ldots, B_n\) be events that partition the sample space. Then:
Bayes’ Theorem#
Given events \(A\) and \(B\):
For events conditioned on \(C\):
Random Variables#
Probability consists of events and their corresponding likelihood of occurring. We often assign values to events, defining a function from events to real numbers. This function is called a random variable.
Discrete and Continuous Random Variables#
Discrete Random Variables: These take on a countable number of distinct values. For example, the number of heads in 10 coin flips is a discrete random variable.
Continuous Random Variables: These take on an uncountable number of possible values, often within a range. For example, the exact height of a person is a continuous random variable.
For example, consider a game where if a coin lands heads, you win $5, and if it lands tails, you lose $10. The random variable \(X\) can be defined as:
Here, \(\omega\) represents the outcome of the coin flip. To analyze such scenarios, we use the concept of expected value.
Expected Value#
The expected value (or mean, expectation, average) is the weighted average of all possible values of a random variable. For a discrete random variable \(X : \Omega \to \mathbb{R}^d\), the expected value is calculated as:
Variance#
Variance measures how far values of a random variable are spread from the mean:
The standard deviation is the square root of the variance:
Covariance#
Covariance measures the relationship between two random variables \(X\) and \(Y\):
Correlation#
Correlation is a normalized measure of the relationship between two random variables:
Two variables are said to be uncorrelated if their correlation is zero, i.e., \(Corr(X, Y) = 0\). This implies that there is no linear relationship between \(X\) and \(Y\), though they may still be dependent in a non-linear fashion. The correlation coefficient ranges from -1 to 1, and assumes that the variance for either variable is non-zero. Generally, the correlation descibes the linear dependence between two random variables, but it does not imply causation.
Joint Random Variables#
When dealing with multiple random variables simultaneously, we consider their joint behavior.
Joint Probability Mass Function (PMF)#
For discrete random variables \(X\) and \(Y\), the joint PMF is defined as:
This gives the probability that \(X\) takes on the value \(x\) and \(Y\) takes on the value \(y\) simultaneously.
Marginal Probability Mass Function (PMF)#
The marginal PMF of \(X\) can be obtained by summing the joint PMF over all possible values of \(Y\):
Similarly, for \(Y\):
Probability Distributions#
PDFs and CDFs#
In continuous probability spaces, the probability of a specific event is zero. Instead, we examine ranges using the Probability Density Function (PDF) and Cumulative Distribution Function (CDF).
CDF: \(F(x) = P(X \leq x)\)
PDF: \(f(x) = \frac{d}{dx} F(x)\)
By the Fundamental Theorem of Calculus:
The probability that a random variable falls within a range is:
Joint and Marginal Densities#
For continuous random variables \(X\) and \(Y\):
Joint Density: The joint probability density function (PDF) is denoted by \(f_{X,Y}(x, y)\) and describes the likelihood of \(X = x\) and \(Y = y\) occurring simultaneously.
Marginal Density: The marginal PDF of \(X\) is obtained by integrating the joint PDF over all possible values of \(Y\):
Similarly, for \(Y\):
Joint Continuous Random Variables#
When two continuous random variables are considered together, they can be described using a joint PDF. For example, if \(X\) and \(Y\) are jointly continuous random variables, the joint PDF \(f_{X,Y}(x, y)\) must satisfy:
Expected Value (Continuous)#
The expected value for a continuous random variable is:
For a function \(g(X)\):
Here’s the updated Common Distributions table with the support of the random variables added:
Common Distributions#
Distribution |
Notation |
PDF / PMF |
Parameters |
Support |
Interpretation |
---|---|---|---|---|---|
Uniform (Discrete) |
\(X \sim \text{Unif}(a, b)\) |
\(P(X = x) = \frac{1}{b - a + 1}\) |
\(a\): lower bound, \(b\): upper bound |
\(x \in \{a, \dots, b\}\) |
Equal probability for each outcome between \(a\) and \(b\) |
Uniform (Continuous) |
\(X \sim \text{Unif}(a, b)\) |
\(f(x) = \frac{1}{b - a}\) for \(a \leq x \leq b\) |
\(a\): lower bound, \(b\): upper bound |
\(x \in [a, b]\) |
Models complete uncertainty over a fixed interval |
Bernoulli |
\(X \sim \text{Bern}(p)\) |
\(P(X = 1) = p\), \(P(X = 0) = 1 - p\) |
\(p\): probability of success |
\(x \in \{0, 1\}\) |
One trial with two outcomes (success/failure) |
Binomial |
\(X \sim \text{Bin}(n, p)\) |
\(P(X = k) = \binom{n}{k} p^k (1 - p)^{n - k}\) |
\(n\): trials, \(p\): success prob. |
\(k \in \{0, \dots, n\}\) |
Number of successes in \(n\) independent trials |
Normal (Gaussian) |
\(X \sim \mathcal{N}(\mu, \sigma^2)\) |
\(f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}\) |
\(\mu\): mean, \(\sigma^2\): variance |
\(x \in \mathbb{R}\) |
Models natural phenomena with bell-curve symmetry |
Poisson |
\(X \sim \text{Pois}(\lambda)\) |
\(P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}\) |
\(\lambda\): rate |
\(k \in \{0, 1, 2, \dots\}\) |
Count of events in fixed time/space with constant rate |
Exponential |
\(X \sim \text{Exp}(\lambda)\) |
\(f(x) = \lambda e^{-\lambda x}\) for \(x \geq 0\) |
\(\lambda\): rate |
\(x \in [0, \infty)\) |
Time between events in a Poisson process |
Geometric |
\(X \sim \text{Geom}(p)\) |
\(P(X = k) = (1 - p)^{k - 1} p\) |
\(p\): probability of success |
\(k \in \{1, 2, \dots\}\) |
Number of trials until first success |
Beta |
\(X \sim \text{Beta}(\alpha, \beta)\) |
\(f(x) = \frac{x^{\alpha - 1}(1 - x)^{\beta - 1}}{B(\alpha, \beta)}\) |
\(\alpha\), \(\beta\): shape parameters |
\(x \in [0, 1]\) |
Models probabilities or proportions (e.g., Bayesian priors) |
Gamma |
\(X \sim \text{Gamma}(k, \theta)\) |
\(f(x) = \frac{x^{k-1} e^{-x/\theta}}{\theta^k \Gamma(k)}\) |
\(k\): shape, \(\theta\): scale |
\(x \in [0, \infty)\) |
General model for wait times; sum of \(k\) exponentials |
Conditional Probability and Expectation#
Conditional Density#
The conditional density function \(f_{X|Y}(x|y)\) is defined as:
where \(f_{X,Y}(x, y)\) is the joint density of \(X\) and \(Y\), and \(f_Y(y)\) is the marginal density of \(Y\).
For continuous random variables \(X\) and \(Y\) with joint PDF \(f_{X,Y}(x, y)\), the conditional probability density of \(X\) given \(Y = y\) is:
Conditional Expectation#
The conditional expectation of a random variable \(X\) given an event \(A\) is:
For continuous random variables:
Here’s the section to add about the Independence of Random Variables and Conditional Independence of Random Variables:
Independence and Conditional Independence of Random Variables#
Independence of Random Variables:
Two random variables X and Y are said to be independent if the occurrence of any specific value of X does not affect the probability distribution of Y , and vice versa. Formally, X and Y are independent (\(X \perp Y\)) if for all x and y:
For continuous random variables, independence is defined using the joint and marginal probability density functions (PDFs):
This means the joint distribution factorizes into the product of the marginal distributions when the variables are independent.
Conditional Independence of Random Variables:
Two random variables X and Y are conditionally independent given a third variable Z if the conditional distribution of X given Y and Z is the same as the conditional distribution of X given Z alone. Formally, X and Y are conditionally independent given Z (\(X \perp Y | Z\)) if:
For continuous random variables, this becomes:
Conditional independence is a crucial concept in probabilistic graphical models and machine learning, simplifying the analysis of complex systems by breaking down dependencies.
Useful Properties#
For random variables \(X, Y, Z\), constants \(a, b, c \in \mathbb{R}\), and function \(g: \mathbb{R}^d \to \mathbb{R}\):
Linearity of Expectation:
Independence and Expectation:
Law of Total Expectation:
Variance Properties:
Covariance Properties:
Correlation Properties: