|
| 1 | +--- |
| 2 | +title: "PMF vs. PDF" |
| 3 | +sidebar_label: PMF & PDF |
| 4 | +description: "A deep dive into Probability Mass Functions (PMF) for discrete data and Probability Density Functions (PDF) for continuous data." |
| 5 | +tags: [probability, pmf, pdf, statistics, mathematics-for-ml, distributions] |
| 6 | +--- |
| 7 | + |
| 8 | +To work with data in Machine Learning, we need a mathematical way to describe how likely different values are to occur. Depending on whether our data is **Discrete** (countable) or **Continuous** (measurable), we use either a **PMF** or a **PDF**. |
| 9 | + |
| 10 | +## 1. Probability Mass Function (PMF) |
| 11 | + |
| 12 | +The **PMF** is used for discrete random variables. It gives the probability that a discrete random variable is exactly equal to some value. |
| 13 | + |
| 14 | +### Key Mathematical Properties: |
| 15 | +1. **Direct Probability:** $P(X = x) = f(x)$. The "height" of the bar is the actual probability. |
| 16 | +2. **Summation:** All individual probabilities must sum to 1. |
| 17 | + $$ |
| 18 | + \sum_{i} P(X = x_i) = 1 |
| 19 | + $$ |
| 20 | +3. **Range:** $0 \le P(X = x) \le 1$. |
| 21 | + |
| 22 | + |
| 23 | +<img className="rounded p-4" src="/tutorial/img/tutorials/ml/probability-mass-function.jpg" alt="Probability Mass Function plot for a Binomial Distribution" /> |
| 24 | + |
| 25 | +**Example:** If you roll a fair die, the PMF is $1/6$ for each value $\{1, 2, 3, 4, 5, 6\}$. There is no "1.5" or "2.7"; the probability exists only at specific points. |
| 26 | + |
| 27 | +## 2. Probability Density Function (PDF) |
| 28 | + |
| 29 | +The **PDF** is used for continuous random variables. Unlike the PMF, the "height" of a PDF curve does **not** represent probability; it represents **density**. |
| 30 | + |
| 31 | +### The "Zero Probability" Paradox |
| 32 | +In a continuous world (like height or time), the probability of a variable being *exactly* a specific number (e.g., exactly $175.00000...$ cm) is effectively **0**. |
| 33 | + |
| 34 | +Instead, we find the probability over an **interval** by calculating the **area under the curve**. |
| 35 | + |
| 36 | +### Key Mathematical Properties: |
| 37 | +1. **Area is Probability:** The probability that $X$ falls between $a$ and $b$ is the integral of the PDF: |
| 38 | + $$ |
| 39 | + P(a \le X \le b) = \int_{a}^{b} f(x) dx |
| 40 | + $$ |
| 41 | +2. **Total Area:** The total area under the entire curve must equal 1. |
| 42 | + $$ |
| 43 | + \int_{-\infty}^{\infty} f(x) dx = 1 |
| 44 | + $$ |
| 45 | +3. **Density vs. Probability:** $f(x)$ can be greater than 1, as long as the total area remains 1. |
| 46 | + |
| 47 | + |
| 48 | +## 3. Comparison at a Glance |
| 49 | + |
| 50 | +```mermaid |
| 51 | +graph LR |
| 52 | + Data[Data Type] --> Disc[Discrete] |
| 53 | + Data --> Cont[Continuous] |
| 54 | + |
| 55 | + Disc --> PMF["PMF: $$P(X=x)$$"] |
| 56 | + Cont --> PDF["PDF: $$f(x)$$"] |
| 57 | + |
| 58 | + PMF --> P_Sum["$$\sum P(x) = 1$$"] |
| 59 | + PDF --> P_Int["$$\int f(x)dx = 1$$"] |
| 60 | + |
| 61 | + PMF --> P_Val["Height = Probability"] |
| 62 | + PDF --> P_Area["Area = Probability"] |
| 63 | +``` |
| 64 | + |
| 65 | +| Feature | PMF (Discrete) | PDF (Continuous) | |
| 66 | +| --- | --- | --- | |
| 67 | +| **Variable Type** | Countable (Integers) | Measurable (Real Numbers) | |
| 68 | +| **Probability at a point** | $P(X=x) = \text{Height}$ | $P(X=x) = 0$ | |
| 69 | +| **Probability over range** | Sum of heights | Area under the curve (Integral) | |
| 70 | +| **Visualization** | Bar chart / Stem plot | Smooth curve | |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## 4. The Bridge: Cumulative Distribution Function (CDF) |
| 75 | + |
| 76 | +The **CDF** is the "running total" of probability. It tells you the probability that a variable is **less than or equal to** $x$. |
| 77 | + |
| 78 | +* **For PMF:** It is a step function (it jumps at every discrete value). |
| 79 | +* **For PDF:** It is a smooth S-shaped curve. |
| 80 | + |
| 81 | +$$ |
| 82 | +F(x) = P(X \le x) |
| 83 | +$$ |
| 84 | + |
| 85 | +```mermaid |
| 86 | +graph LR |
| 87 | + PDF["PDF (Density) <br/> $$f(x)$$"] -- " Integrate: <br/> $$\int_{-\infty}^{x} f(t) dt$$ " --> CDF["CDF (Cumulative) <br/> $$F(x)$$"] |
| 88 | + CDF -- " Differentiate: <br/> $$\frac{d}{dx} F(x)$$ " --> PDF |
| 89 | +
|
| 90 | + style PDF fill:#fdf,stroke:#333,color:#333 |
| 91 | + style CDF fill:#def,stroke:#333,color:#333 |
| 92 | +``` |
| 93 | + |
| 94 | +## 5. Why this matters in Machine Learning |
| 95 | + |
| 96 | +1. **Likelihood Functions:** When training models (like Logistic Regression), we maximize the **Likelihood**. For discrete labels, this uses the PMF; for continuous targets, it uses the PDF. |
| 97 | +2. **Anomaly Detection:** We often flag a data point as an outlier if its PDF value (density) is below a certain threshold. |
| 98 | +3. **Generative Models:** VAEs and GANs attempt to learn the underlying **PDF** of a dataset so they can sample new points from high-density regions (creating realistic images or text). |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +Now that you understand how we describe probability at a point or over an area, it's time to meet the most important distribution in all of data science. |
0 commit comments