How I Sped Up Prime Number Computation
One of my personal projects involved computing prime numbers as fast as possible. What started as a simple exercise turned into a deep dive on algorithm design and distributed computing.
Starting Point: Trial Division
The most obvious approach: for every number n, check if any integer from 2 to √n divides it evenly.
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n**0.5) + 1):
if n % i == 0:
return False
return True
Simple, but slow for large ranges. Every number gets its own full check.
The Sieve of Eratosthenes
Instead of checking one number at a time, the Sieve marks composites in bulk:
def sieve(limit):
is_prime = [True] * (limit + 1)
is_prime[0] = is_prime[1] = False
for i in range(2, int(limit**0.5) + 1):
if is_prime[i]:
for j in range(i*i, limit + 1, i):
is_prime[j] = False
return [i for i, v in enumerate(is_prime) if v]
This is dramatically faster for finding all primes up to a large limit, but requires memory proportional to the limit.
Taking It Further: A Cluster
For very large ranges, I set up a small network of computers to split the work. Each machine handled a segment of the number range, and results were collected centrally.
The honest finding: for ranges where everything fits in memory, a well-optimized local sieve beats the cluster. Network overhead and coordination costs hurt small jobs. But the cluster showed its value for ranges too large for any single machine’s RAM.
What I Took Away
- Algorithm choice matters more than raw hardware, up to a point
- Distributed computing has real overhead that has to be justified by the problem size
- Benchmarking with real numbers (not assumptions) is essential
I’m planning to explore segmented sieves and possibly GPU-based approaches next.