Verifying the fraction
Background
In my last post we worked through a probability problem under heavy duress. Three points chosen uniformly at random from a unit disk - what's the probability their circumcircle fits inside?
After pages of Jacobians, angular integrals, and trig identities, I arrived at
You wonder if you actually got the right answer. It was a nice fraction but did you not overcount? Undercount? Did you mess up the integral?
Time to check. The beautiful thing about probability problems is that you can often verify analytical results empirically. Generate a million random instances, count successes, divide. Let's find out.
The plan
- Sample three points uniformly from the unit disk
- Compute their circumcircle (we can find a closed form expression)
- Check if it fits inside the unit disk
- Repeat a few million times
Simple enough. But there are a couple of subtleties worth walking through.
Sampling uniformly from a disk
Our first instinct might be: sample
The cleaner approach is polar coordinates. But here's where intuition can betray us. If we sample:
we do not get uniform density in the disk. We tend to oversample near the center.
Why? Think about concentric rings. A thin annulus at radius
We want to sample points uniformly by area inside the unit disk. "Uniform by area" means: if we take any two regions with the same area, they should have the same probability of containing our sampled point.
Step 1: What's the probability of landing within radius
For uniform sampling, probability is proportional to area. The area inside radius
So the probability of our point having radius
This is the CDF (cumulative distribution function) of the radius
Step 2: The inverse transform method
Here's a general trick for sampling from any distribution. If we have:
is our desired CDF, which must be strictly monotonic
Then
To verify: we need
Step 3: Apply to our case
Our CDF is
So if
Intuition check: When
So 75% of points land in the outer half of the radius range - which makes sense because the outer annulus has more area!
So the recipe is:
Computing the circumcircle
Given three points
Expanding
If we expand
Rearranging:
Similarly for
This is a
The system becomes:
By Cramer's rule:
An aside on Cramer
Cramer's rule solves linear systems using determinants. For a
The solution is:
The pattern: to solve for a variable, replace its column in the coefficient matrix with the right-hand side vector, then divide that determinant by the original determinant.
Applying to the circumcircle:
Our system (after the algebra) is:
where:
, ,
Let
By Cramer's rule:
In the code, I fold the factor of 2 into
When 1e-12 tolerance catches these numerical edge cases rather than dividing by something tiny and getting nonsense.
Once we have the circumcenter, the circumradius is the distance from the center to any of the three points.
The containment check
The circumcircle fits inside the unit disk if and only if every point on the circumcircle is at distance
So our condition is simply:
The code
Here's the full implementation. AFAIK M_PI is a POSIX extension and so make sure to define that if you'd like to run this on Windows.
#include <iomanip>
#include <iostream>
#include <random>
struct Point {
double x, y;
};
struct Circle {
double cx, cy, r;
bool valid;
};
// Sample a point uniformly from the unit disk
Point sample_unit_disk(std::mt19937 &rng,
std::uniform_real_distribution<>
&dist) {
double u1 = dist(rng);
double u2 = dist(rng);
// r = sqrt(U) gives uniform area distribution
double r = std::sqrt(u1);
double theta = 2.0 * M_PI * u2;
return {r * std::cos(theta), r * std::sin(theta)};
}
// Compute the circumcircle of three points
Circle circumcircle(const Point &p1, const Point &p2, const Point &p3) {
double ax = p1.x - p3.x;
double ay = p1.y - p3.y;
double bx = p2.x - p3.x;
double by = p2.y - p3.y;
double D = 2.0 * (ax * by - ay * bx);
// Collinear points
if (std::abs(D) < 1e-12) {
return {0, 0, 0, false};
}
double s1 = (p1.x * p1.x + p1.y * p1.y) - (p3.x * p3.x + p3.y * p3.y);
double s2 = (p2.x * p2.x + p2.y * p2.y) - (p3.x * p3.x + p3.y * p3.y);
double cx = (s1 * by - s2 * ay) / D;
double cy = (ax * s2 - bx * s1) / D;
double r = std::sqrt((cx - p1.x) * (cx - p1.x) + (cy - p1.y) * (cy - p1.y));
return {cx, cy, r, true};
}
bool circle_inside_unit_disk(const Circle &c) {
double dist_to_origin = std::sqrt(c.cx * c.cx + c.cy * c.cy);
return (dist_to_origin + c.r) <= 1.0;
}
int main(int argc, const char *argv[]) {
long long N = 1'000'000;
if (argc > 1) {
N = std::stoll(argv[1]);
}
std::random_device rd;
std::mt19937 rng(rd());
std::uniform_real_distribution<> dist(0.0, 1.0);
long long successes = 0;
long long collinear_skipped = 0;
for (long long i = 0; i < N; ++i) {
Point p1 = sample_unit_disk(rng, dist);
Point p2 = sample_unit_disk(rng, dist);
Point p3 = sample_unit_disk(rng, dist);
Circle circ = circumcircle(p1, p2, p3);
if (!circ.valid) {
collinear_skipped++;
continue;
}
if (circle_inside_unit_disk(circ)) {
successes++;
}
}
long long valid_trials = N - collinear_skipped;
double probability = static_cast<double>(successes) / valid_trials;
std::cout << std::fixed << std::setprecision(6);
std::cout << "Trials: " << N << "\n";
std::cout << "Successes: " << successes << "\n";
std::cout << "Estimated: " << probability << "\n";
std::cout << "Expected: " << 2.0 / 5.0 << "\n";
return 0;
}
Compile with:
g++ -O3 -std=c++17 -o circumcircle circumcircle.cpp
Results
Running with 10 million trials:
Trials: 10000000
Successes: 3999797
Estimated: 0.399980
Expected: 0.400000
We get a relative error of about 0.005%. The analytical answer holds up. Note that different runs will obviously give us slightly different answers but we should hover around the 40% mark.
Convergence and confidence
Monte Carlo estimation is really just the law of large numbers in action. We're approximating an expectation (the true probability
The central limit theorem view
Let
Our estimator is the sample mean:
By the central limit theorem, for large
The standard error (standard deviation of our estimator) is:
For
Interpreting our result
Our estimate was
In units of standard errors (often called "z-score" or "sigma"):
A 0.13-sigma deviation is completely unremarkable. For reference:
- ~68% of runs will fall within 1 sigma (
) - ~95% will fall within 2 sigma (
) - ~99.7% will fall within 3 sigma (
)
Our result landing 0.13 sigma from the truth is exactly what we'd expect from a working simulation.
The convergence rate
The standard error scales as
| Samples |
Standard Error | Accurate digits |
|---|---|---|
| ~0.0049 | ~2 | |
| ~0.00049 | ~3 | |
| ~0.000049 | ~4 | |
| ~0.0000049 | ~5 |
To verify that
Four trillion samples! At that point, trusting the algebra is the better strategy.
Conclusion
The hoodlum's problem has a verified answer:
Monte Carlo methods won't prove your closed-form solution is correct, but they're excellent at catching mistakes. If I'd botched the Jacobian or miscounted a factor somewhere, this simulation would have returned something noticeably different from 0.4.
It didn't. I can sleep easy.
- ← Previous
Dark Alley Mathematics