Skip to main content
SZCZ

Verifying the fraction

Background

In my last post we worked through a probability problem under heavy duress. Three points chosen uniformly at random from a unit disk - what's the probability their circumcircle fits inside?

After pages of Jacobians, angular integrals, and trig identities, I arrived at P=2/5. But I ended the post with some uncertainty:

You wonder if you actually got the right answer. It was a nice fraction but did you not overcount? Undercount? Did you mess up the integral?

Time to check. The beautiful thing about probability problems is that you can often verify analytical results empirically. Generate a million random instances, count successes, divide. Let's find out.

The plan

  1. Sample three points uniformly from the unit disk
  2. Compute their circumcircle (we can find a closed form expression)
  3. Check if it fits inside the unit disk
  4. Repeat a few million times

Simple enough. But there are a couple of subtleties worth walking through.

Sampling uniformly from a disk

Our first instinct might be: sample x and y uniformly from [1,1], reject if x2+y2>1. This does work - we get uniform density inside the disk. But it's wasteful (you throw away 1π/421 of samples).

The cleaner approach is polar coordinates. But here's where intuition can betray us. If we sample:

we do not get uniform density in the disk. We tend to oversample near the center.

Why? Think about concentric rings. A thin annulus at radius r has area 2πr dr. The area grows linearly with r. If we spread points evenly across radius values, the outer regions (which have more area) get underpopulated.

We want to sample points uniformly by area inside the unit disk. "Uniform by area" means: if we take any two regions with the same area, they should have the same probability of containing our sampled point.

Step 1: What's the probability of landing within radius r?

For uniform sampling, probability is proportional to area. The area inside radius r is πr2. The total area of the unit disk is π12=π.

So the probability of our point having radius r is:

P(Rr)=area inside rtotal area=πr2π=r2

This is the CDF (cumulative distribution function) of the radius R.

Step 2: The inverse transform method

Here's a general trick for sampling from any distribution. If we have:

Then X=F1(U) has the distribution we want.

To verify: we need P(Xx)=F(x). Since F is monotonic, F1(U)x if and only if UF(x). And since F is a CDF, F(x)[0,1], so we can use the defining property of the uniform distribution:

P(Xx)=P(F1(U)x)=P(UF(x))=F(x)

Step 3: Apply to our case

Our CDF is F(r)=r2. To invert it:

u=r2r=u

So if UUniform(0,1), then R=U has CDF P(Rr)=r2, which is exactly the distribution we need for uniform area sampling.

Intuition check: When U is uniform on [0,1], U is biased toward larger values. For example:

So 75% of points land in the outer half of the radius range - which makes sense because the outer annulus has more area!


So the recipe is: r=U1,θ=2πU2 x=rcosθ,y=rsinθ

Computing the circumcircle

Given three points P1,P2,P3, the circumcenter (X,Y) is equidistant from all three: (Xxi)2+(Yyi)2=R2for i=1,2,3

Expanding P1's equation: X22Xx1+x12+Y22Yy1+y12=R2

If we expand P3's equation and subtract, the X2, Y2, and R2 terms cancel: 2X(x1x3)2Y(y1y3)+(x12x32)+(y12y32)=0

Rearranging: X(x1x3)+Y(y1y3)=12[(x12x32)+(y12y32)]

Similarly for P2 minus P3: X(x2x3)+Y(y2y3)=12[(x22x32)+(y22y32)]

This is a 2×2 linear system in X and Y. Let: ax=x1x3,ay=y1y3 bx=x2x3,by=y2y3 s1=(x12+y12)(x32+y32) s2=(x22+y22)(x32+y32)

The system becomes: [axaybxby][XY]=12[s1s2]

By Cramer's rule:

An aside on Cramer

Cramer's rule solves linear systems using determinants. For a 2×2 system:

[abcd][xy]=[ef]

The solution is:

x=|ebfd||abcd|=edbfadbc

y=|aecf||abcd|=afecadbc

The pattern: to solve for a variable, replace its column in the coefficient matrix with the right-hand side vector, then divide that determinant by the original determinant.

Applying to the circumcircle:

Our system (after the algebra) is:

[axaybxby][XY]=12[s1s2]

where:

  • ax=x1x3, ay=y1y3
  • bx=x2x3, by=y2y3
  • s1=(x12+y12)(x32+y32)
  • s2=(x22+y22)(x32+y32)

Let D=axbyaybx (the determinant of the coefficient matrix).

By Cramer's rule:

X=12|s1ays2by|D=s1bys2ay2D

Y=12|axs1bxs2|D=axs2bxs12D

In the code, I fold the factor of 2 into D by defining D=2D=2(axbyaybx), which gives:

X=s1bys2ayD,Y=axs2bxs1D

D=2(axbyaybx) X=s1bys2ayD,Y=axs2bxs1D

When D=0, the points are collinear and there's no finite circumcircle. In our simulation, three random points being exactly collinear is a measure-zero event - probability zero in the continuous setting. But we still handle it in code because floating-point arithmetic can produce near-collinear configurations. The 1e-12 tolerance catches these numerical edge cases rather than dividing by something tiny and getting nonsense.

Once we have the circumcenter, the circumradius is the distance from the center to any of the three points.

The containment check

The circumcircle fits inside the unit disk if and only if every point on the circumcircle is at distance 1 from the origin. The farthest point on the circumcircle from the origin is at distance: X2+Y2+R

So our condition is simply: X2+Y2+R1

The code

Here's the full implementation. AFAIK M_PI is a POSIX extension and so make sure to define that if you'd like to run this on Windows.

#include <iomanip>
#include <iostream>
#include <random>

struct Point {
    double x, y;
};

struct Circle {
    double cx, cy, r;
    bool valid;
};

// Sample a point uniformly from the unit disk
Point sample_unit_disk(std::mt19937 &rng,
                       std::uniform_real_distribution<>
                           &dist) {
    double u1 = dist(rng);
    double u2 = dist(rng);

    // r = sqrt(U) gives uniform area distribution
    double r = std::sqrt(u1);
    double theta = 2.0 * M_PI * u2;

    return {r * std::cos(theta), r * std::sin(theta)};
}

// Compute the circumcircle of three points
Circle circumcircle(const Point &p1, const Point &p2, const Point &p3) {
    double ax = p1.x - p3.x;
    double ay = p1.y - p3.y;
    double bx = p2.x - p3.x;
    double by = p2.y - p3.y;

    double D = 2.0 * (ax * by - ay * bx);

    // Collinear points
    if (std::abs(D) < 1e-12) {
        return {0, 0, 0, false};
    }

    double s1 = (p1.x * p1.x + p1.y * p1.y) - (p3.x * p3.x + p3.y * p3.y);
    double s2 = (p2.x * p2.x + p2.y * p2.y) - (p3.x * p3.x + p3.y * p3.y);

    double cx = (s1 * by - s2 * ay) / D;
    double cy = (ax * s2 - bx * s1) / D;
    double r = std::sqrt((cx - p1.x) * (cx - p1.x) + (cy - p1.y) * (cy - p1.y));

    return {cx, cy, r, true};
}

bool circle_inside_unit_disk(const Circle &c) {
    double dist_to_origin = std::sqrt(c.cx * c.cx + c.cy * c.cy);
    return (dist_to_origin + c.r) <= 1.0;
}

int main(int argc, const char *argv[]) {
    long long N = 1'000'000;
    if (argc > 1) {
        N = std::stoll(argv[1]);
    }

    std::random_device rd;
    std::mt19937 rng(rd());
    std::uniform_real_distribution<> dist(0.0, 1.0);

    long long successes = 0;
    long long collinear_skipped = 0;

    for (long long i = 0; i < N; ++i) {
        Point p1 = sample_unit_disk(rng, dist);
        Point p2 = sample_unit_disk(rng, dist);
        Point p3 = sample_unit_disk(rng, dist);

        Circle circ = circumcircle(p1, p2, p3);

        if (!circ.valid) {
            collinear_skipped++;
            continue;
        }

        if (circle_inside_unit_disk(circ)) {
            successes++;
        }
    }

    long long valid_trials = N - collinear_skipped;
    double probability = static_cast<double>(successes) / valid_trials;

    std::cout << std::fixed << std::setprecision(6);
    std::cout << "Trials:     " << N << "\n";
    std::cout << "Successes:  " << successes << "\n";
    std::cout << "Estimated:  " << probability << "\n";
    std::cout << "Expected:   " << 2.0 / 5.0 << "\n";

    return 0;
}

Compile with:

g++ -O3 -std=c++17 -o circumcircle circumcircle.cpp

Results

Running with 10 million trials:

Trials:     10000000
Successes:  3999797
Estimated:  0.399980
Expected:   0.400000

We get a relative error of about 0.005%. The analytical answer holds up. Note that different runs will obviously give us slightly different answers but we should hover around the 40% mark.

Convergence and confidence

Monte Carlo estimation is really just the law of large numbers in action. We're approximating an expectation (the true probability p) by an average of independent samples (our indicator variables for "circumcircle fits").

The central limit theorem view

Let Xi=1 if trial i succeeds, 0 otherwise. Each Xi is a Bernoulli random variable with E[Xi]=p and Var(Xi)=p(1p).

Our estimator is the sample mean: p^=1Ni=1NXi

By the central limit theorem, for large N: p^N(p,p(1p)N)

The standard error (standard deviation of our estimator) is: SE=p(1p)N

For p=0.4 and N=107: SE=0.4×0.6107=0.241070.000155

Interpreting our result

Our estimate was p^=0.399980. The deviation from the true value is: |p^p|=|0.3999800.400000|=0.000020

In units of standard errors (often called "z-score" or "sigma"): z=|p^p|SE=0.0000200.0001550.13

A 0.13-sigma deviation is completely unremarkable. For reference:

Our result landing 0.13 sigma from the truth is exactly what we'd expect from a working simulation.

The O(1/N) convergence rate

The standard error scales as 1/N. This has a practical consequence: to halve your error, you need four times as many samples.

Samples N Standard Error Accurate digits
104 ~0.0049 ~2
106 ~0.00049 ~3
108 ~0.000049 ~4
1010 ~0.0000049 ~5

To verify that p=0.4 to 6 decimal places with 95% confidence, you'd need the 2-sigma interval to be smaller than 0.0000005. That means SE<0.00000025, requiring: N>0.24(0.00000025)24×1012

Four trillion samples! At that point, trusting the algebra is the better strategy.

Conclusion

The hoodlum's problem has a verified answer: P=2/5.

Monte Carlo methods won't prove your closed-form solution is correct, but they're excellent at catching mistakes. If I'd botched the Jacobian or miscounted a factor somewhere, this simulation would have returned something noticeably different from 0.4.

It didn't. I can sleep easy.