Formal Sciences Mathematics Updated 2026-05-22

Probability

Chance, randomness, and stochastic reasoning

Mature 5/6 lenses 76 Schema ✓ Formal Causal Procedural Simulable Measurable
What is its essence? What are the irreducible elements and ideal forms?
latent, essential, uniform — knowledge is the recovery of ideal forms
First Principles · Pythagoras · Plato · Aristotle
What are the axioms and definitions? What can be proven from them?
certain and deducible — knowledge is what follows necessarily from axioms
Formal / Axiomatic · Euclid · the logicians
What can be measured? What causes what? What is the evidence?
sampled from a limitless nature by measurement and cause/effect
Empirical · Bacon · Galileo · the early chemists
What is the procedure? Inputs → steps → outputs?
effective and constructible — knowledge is an executable procedure
Computational · al-Khwarizmi · Turing
What are the stocks, flows, feedback loops, and equilibria?
dynamic — knowledge is flows, feedback, and equilibrium
Cybernetic · Wiener · Bertalanffy · Forrester

Elements

The fundamental element of probability is the event — a subset of the sample space Ω\Omega, the collection of all possible outcomes. From events, we construct two higher forms:

  • Random variable — a function X:ΩRX : \Omega \to \mathbb{R} that assigns a number to each outcome; it transforms raw possibility into a measurable quantity.
  • Probability distribution — the complete form, encoding the possibility of all states simultaneously. This is the ideal object: the pmf, pdf, or CDF from which everything else is derived.
  • Expected valueE[X]=xp(x)\mathbb{E}[X] = \sum x \, p(x), the distribution’s center of gravity, a single number summarizing the whole.

Key distribution families

  • Gaussian N(μ,σ2)\mathcal{N}(\mu, \sigma^2) — the attractor of sums (Central Limit Theorem).
  • Binomial Bin(n,p)\text{Bin}(n, p) — counts of successes in nn independent trials.
  • Poisson Pois(λ)\text{Pois}(\lambda) — counts of rare events per unit time or space.

Axiomatic Foundation

Kolmogorov’s axioms reduce probability to measure theory. Given a sample space Ω\Omega and a σ\sigma-algebra F\mathcal{F} of events, a probability measure P:F[0,1]P : \mathcal{F} \to [0,1] must satisfy:

  1. Non-negativity: P(A)0P(A) \geq 0 for all AFA \in \mathcal{F}.
  2. Normalization: P(Ω)=1P(\Omega) = 1.
  3. Countable additivity: if A1,A2,A_1, A_2, \ldots are pairwise disjoint, P ⁣(iAi)=iP(Ai)P\!\left(\bigcup_i A_i\right) = \sum_i P(A_i).

Derived rules

From these three axioms, all of classical probability is proven — not observed:

P(AB)=P(AB)P(B)P(A|B) = \frac{P(A \cap B)}{P(B)}

P(AB)posterior=P(BA)P(A)P(B)(Bayes’ theorem)\underbrace{P(A|B)}_{\text{posterior}} = \frac{P(B|A)\,P(A)}{P(B)} \quad \text{(Bayes' theorem)}

P(A)=iP(ABi)P(Bi)(law of total probability)P(A) = \sum_i P(A|B_i)\,P(B_i) \quad \text{(law of total probability)}

Conditional probability and Bayes’ theorem are theorems, consequences of the axioms, not independent postulates.

Measurement and Evidence

Before it was axiomatized, probability was the empirical study of frequencies — ratios of favorable outcomes to total trials. The deductive and experimental faces meet here: the measure P(A)P(A) is the limit the relative frequency converges to as trials grow.

Key measurement concepts:

  • Sample — a finite draw from a population; sampling variability is the source of uncertainty in estimates.
  • Estimator — a statistic θ^\hat{\theta} computed from data to estimate a population parameter; its sampling distribution describes how it would vary across repeated experiments.
  • Confidence interval — a random interval that contains the true parameter with prescribed probability under repeated sampling.

Causal structure in probabilistic experiments:

  • Increasing sample size reduces estimator variance (law of large numbers).
  • A stronger (more concentrated) prior concentrates the posterior around prior beliefs, reducing the influence of data.

The Central Limit Theorem is the bridge: sums of independent, identically distributed random variables converge in distribution to a Gaussian, regardless of the original shape — making the normal distribution the empirical attractor of measurements.

Procedures

The algorithmic lens asks: what is the effective procedure for computing with probability?

Maximum Likelihood Estimation

  1. Write the likelihood L(θdata)=ip(xiθ)L(\theta \mid \text{data}) = \prod_i p(x_i \mid \theta).
  2. Take the log: (θ)=ilogp(xiθ)\ell(\theta) = \sum_i \log p(x_i \mid \theta).
  3. Differentiate and set θ=0\nabla_\theta \ell = 0; solve for θ^\hat\theta.

Bayesian Update

P(θdata)P(dataθ)P(θ)P(\theta \mid \text{data}) \propto P(\text{data} \mid \theta) \cdot P(\theta)

This is a one-step multiplicative update: multiply prior by likelihood, then normalize.

Monte Carlo

When analytic integration is intractable, draw nn samples xipx_i \sim p and approximate:

Ep[f(X)]1ni=1nf(xi)\mathbb{E}_p[f(X)] \approx \frac{1}{n}\sum_{i=1}^n f(x_i)

Variants — rejection sampling, importance sampling, Metropolis-Hastings MCMC, Gibbs sampling — extend this to distributions known only up to a normalizing constant.

Probability as a System

Viewed systemically, Bayesian inference is a feedback system for updating belief:

  • Stock: the current belief state — the prior P(θ)P(\theta).
  • Flow: evidence (observed data), which drives a Bayesian update.
  • New stock: the posterior P(θdata)P(\theta \mid \text{data}), which becomes the prior for the next observation.

This loop is reinforcing: each observation refines beliefs, and refined beliefs shape what future observations mean. The distribution is the system’s state; Bayes’ theorem is the transition rule.

The law of large numbers is the equilibrium theorem: as the flow of data grows, the posterior concentrates around the true parameter — the system converges to a fixed point.

Connections

Probability connects to statistics as its foundation — statistical inference is applied probability over samples and populations. It connects to calculus through measure theory and the theory of integration. Linear algebra enters through multivariate distributions, covariance matrices, and the geometry of high-dimensional probability.

Back to Mathematics Narsil · A Living Encyclopedia