概率分布：一本直观指南

2026-05-05 #Tech

本文介绍了数据科学中常见的概率分布的基本概念。

所有讨论的分布都是参数化的，这意味着它们的形状由少数特定参数决定，例如正态分布由均值和方差定义。

这些分布的基础是伯努利试验，即只有两种结果的事件（成功或失败）。

当进行一系列独立的伯努利试验时，二项分布用于计算获得k次成功的概率，而几何分布用于计算首次成功所需的试验次数。

正态分布可用于近似大样本的二项分布，以便进行假设检验。

查看原文开头（英文 · 仅前 3 段）

In this series, I’ll break down the intuition behind the most common probability distributions in Data Science - their structural similarities, use cases and examples.All the distributions we’ll cover are parametric. This just means the entire “shape” of the distribution can be described by a few specific numbers: the parameters. For example, a Normal distribution is defined entirely by just two: its mean and its variance.The underlying event for both of these distributions is a Bernoulli trial. A Bernoulli trial is an event which has one of two outcomes - one is usually labelled success which has the probability, p, and is the event we are interested in, while the other is a failure. Note that the outcome space here is discrete. For example, head in a coin toss, user signing up and etc. A Bernoulli trial on its own isn’t that interesting. It becomes interesting when we do a series of independent Bernoulli trials each with probability of success p. Because then we can ask two questions:What is the probability of getting k successes in n trials? This is modeled by Binomial distribution.What is the probability of getting the first success in k trials? This is modeled by Geometric distribution.Let’s say n = 5 and k = 2. And the probability of success in each independent Bernoulli trial is p.For us to get k = 2 successes, we precisely need 2 successes and 3 failures in 5 trials,\(p ^ 2 (1-p) ^ 3.\)However, note that the 2 successes can come from any 2 of the 5 trials. So for example, in the case of Success, Failure, Failure, Failure, Success, we have, \(P(1-P)(1-P)(1-P)P = p ^ 2 (1-p) ^ 3.\)All of these combinations are captured by the n choose k operator which returns the number of ways we can pick n (=2) successes out of n (=5). So the final probability is, \(\binom{5}{2} p ^ 2 (1-p) ^ 3\)And the generalized form is,\(\binom{n}{k} p ^ k (1-p) ^ {(n-k)}\)We will come back to the binomial distribution later in the series. But as a spoiler, for large values of n the (n choose k) is tedious to calculate, and we will get around that by using the Normal distribution. This approximation is used for running statistical tests on binomial metrics in hypothesis testing.Again, assume that the probability of success in each independent Bernoulli trial is p.So if k = 1, then P is the probability of success in the first trial. Then it is just,\(P(k=1) = p\)For k = 2, the first trial must be a failure and the second one must be a success. And we have,\(P(k=2) = (1-p)p\)For k = 3, \(P(k=3) = (1-p)(1-p)p\)And so on.So this can be generalized for any k, \(P(k) = (1-p)^{(k-1)} p\)Therefore, to compute the probability P of k successes in n independent Bernoulli trials, we use,\(P(k) = \binom{n}{k} p ^ k (1-p) ^ {(n-k)}.\)And to get the probability P of getting the first success in k independent Bernoulli trials, we use,\(P(k) = (1 - p)^{(k-1)}p\)In the next series, we’ll move from discrete counts to time-based continuous models, introducing Poisson processes and the Exponential distribution.

※ 出于版权考虑，仅引用前 3 段。完整内容请阅读原文。

— 阅读原文 ↗

阅读原文 ↗