Probability | Narsil

Elements

The fundamental element of probability is the event — a subset of the sample space $\Omega$ , the collection of all possible outcomes. From events, we construct two higher forms:

Random variable — a function $X : \Omega \to \mathbb{R}$ that assigns a number to each outcome; it transforms raw possibility into a measurable quantity.
Probability distribution — the complete form, encoding the possibility of all states simultaneously. This is the ideal object: the pmf, pdf, or CDF from which everything else is derived.
Expected value — $\mathbb{E}[X] = \sum x \, p(x)$ , the distribution’s center of gravity, a single number summarizing the whole.

Key distribution families

Gaussian $\mathcal{N}(\mu, \sigma^2)$ — the attractor of sums (Central Limit Theorem).
Binomial $\text{Bin}(n, p)$ — counts of successes in $n$ independent trials.
Poisson $\text{Pois}(\lambda)$ — counts of rare events per unit time or space.

Axiomatic Foundation

Kolmogorov’s axioms reduce probability to measure theory. Given a sample space $\Omega$ and a $\sigma$ -algebra $\mathcal{F}$ of events, a probability measure $P : \mathcal{F} \to [0,1]$ must satisfy:

Non-negativity: $P(A) \geq 0$ for all $A \in \mathcal{F}$ .
Normalization: $P(\Omega) = 1$ .
Countable additivity: if $A_1, A_2, \ldots$ are pairwise disjoint, $P\!\left(\bigcup_i A_i\right) = \sum_i P(A_i)$ .

Derived rules

From these three axioms, all of classical probability is proven — not observed:

$P(A|B) = \frac{P(A \cap B)}{P(B)}$

$\underbrace{P(A|B)}_{\text{posterior}} = \frac{P(B|A)\,P(A)}{P(B)} \quad \text{(Bayes' theorem)}$

$P(A) = \sum_i P(A|B_i)\,P(B_i) \quad \text{(law of total probability)}$

Conditional probability and Bayes’ theorem are theorems, consequences of the axioms, not independent postulates.

Measurement and Evidence

Before it was axiomatized, probability was the empirical study of frequencies — ratios of favorable outcomes to total trials. The deductive and experimental faces meet here: the measure $P(A)$ is the limit the relative frequency converges to as trials grow.

Key measurement concepts:

Sample — a finite draw from a population; sampling variability is the source of uncertainty in estimates.
Estimator — a statistic $\hat{\theta}$ computed from data to estimate a population parameter; its sampling distribution describes how it would vary across repeated experiments.
Confidence interval — a random interval that contains the true parameter with prescribed probability under repeated sampling.

Causal structure in probabilistic experiments:

Increasing sample size reduces estimator variance (law of large numbers).
A stronger (more concentrated) prior concentrates the posterior around prior beliefs, reducing the influence of data.

The Central Limit Theorem is the bridge: sums of independent, identically distributed random variables converge in distribution to a Gaussian, regardless of the original shape — making the normal distribution the empirical attractor of measurements.

Procedures

The algorithmic lens asks: what is the effective procedure for computing with probability?

Maximum Likelihood Estimation

Write the likelihood $L(\theta \mid \text{data}) = \prod_i p(x_i \mid \theta)$ .
Take the log: $\ell(\theta) = \sum_i \log p(x_i \mid \theta)$ .
Differentiate and set $\nabla_\theta \ell = 0$ ; solve for $\hat\theta$ .

Bayesian Update

$P(\theta \mid \text{data}) \propto P(\text{data} \mid \theta) \cdot P(\theta)$

This is a one-step multiplicative update: multiply prior by likelihood, then normalize.

Monte Carlo

When analytic integration is intractable, draw $n$ samples $x_i \sim p$ and approximate:

$\mathbb{E}_p[f(X)] \approx \frac{1}{n}\sum_{i=1}^n f(x_i)$

Variants — rejection sampling, importance sampling, Metropolis-Hastings MCMC, Gibbs sampling — extend this to distributions known only up to a normalizing constant.

Probability as a System

Viewed systemically, Bayesian inference is a feedback system for updating belief:

Stock: the current belief state — the prior $P(\theta)$ .
Flow: evidence (observed data), which drives a Bayesian update.
New stock: the posterior $P(\theta \mid \text{data})$ , which becomes the prior for the next observation.

This loop is reinforcing: each observation refines beliefs, and refined beliefs shape what future observations mean. The distribution is the system’s state; Bayes’ theorem is the transition rule.

The law of large numbers is the equilibrium theorem: as the flow of data grows, the posterior concentrates around the true parameter — the system converges to a fixed point.

Connections

Probability connects to statistics as its foundation — statistical inference is applied probability over samples and populations. It connects to calculus through measure theory and the theory of integration. Linear algebra enters through multivariate distributions, covariance matrices, and the geometry of high-dimensional probability.