GPU-accelerated verification of Goldbach's conjecture using CUDA and GMP.
Goldbach's conjecture states that every even integer greater than 2 can be expressed as the sum of two prime numbers. First proposed in 1742, it remains one of the oldest unsolved problems in mathematics. Verified computationally up to 4×10¹⁸ by distributed computing [1], but never formally proved.
This project provides an open source, reproducible framework for exhaustive range verification and arbitrary precision single-number checking on consumer hardware. No counterexamples have been found in any computation.
The following figure shows total wall‑clock time to verify all even integers up to (N) using the three main implementations:
- CPU baseline (
cpu_goldbach) - GPU (
goldbach)
Single GPU All even integers up to 10¹² verified on a single NVIDIA RTX 5090 in 37 seconds.
Cloud / HPC: Verification up to 10¹² in 18.8 seconds with 2x NVIDIA RTX 5090.
This is achieved by our multi-GPU, segmented double-sieve verifier with dynamic load balancing, GPU-tiled sieving, and optimized Phase 2 fallbacks. See Architecture and Results below.
- Verify Goldbach's conjecture to the largest range feasible on consumer hardware
- Demonstrate GPU-accelerated number theory with rigorous correctness guarantees
- Benchmark and compare CPU vs GPU architectures for sieve-based computation
- Provide fully reproducible scientific results on a fixed hardware platform
- Serve as a reference implementation for GPU Goldbach verification
The project contains multiple tools of increasing capability, each validating against the previous:
| Tool | Method | Range | Hard limit |
|---|---|---|---|
goldbach |
multi-GPU, segmented double-sieve | exhaustive | (~1.8×10¹⁹) / time |
cpu_goldbach |
CPU sieve + sequential scan | exhaustive | system RAM / time |
single_check |
GPU Miller-Rabin | single number | uint64_t (~1.8×10¹⁹) |
big_check |
GMP + OpenMP | single number | time only |
goldbach_gpu stores primality as one byte per number. At 10⁹ this requires 953 MB VRAM -- already approaching limits. Not usable beyond 10⁹ on 8 GB GPUs.
goldbach_gpu2 encodes one bit per odd number, a 16× memory reduction. This makes 10¹¹ feasible (5,960 MB) but 10¹² impossible (would need 59 GB). The VRAM ceiling is architectural -- no amount of runtime can overcome it.
goldbach_gpu3 removes the ceiling entirely via a segmented design. For each segment [A, B] of even numbers, it iterates over candidate primes p and marks all n in [A, B] where n-p is prime as verified. Crucially, q = n-p is checked globally (small bitset, segment bitset, or Miller-Rabin) -- not forced into the current segment, which would miss valid partitions. Each segment uses only ~60 MB VRAM regardless of total range. An exhaustive CPU fallback guarantees rigor for any number the GPU phase does not resolve; in all runs to date, this fallback has never been triggered.
goldbach (flagship) extends goldbach_gpu3 with multi-GPU support via lock-free work queues and thread-safe globals. It adds GPU-tiled sieving for segments, optimized Phase 2 (sieve to 10^8 + primality test: Baillie-PSW or Miller-Rabin, selectable via --primetest=bpsw|mr), overflow-safe operations (P_SMALL ≤ 4e9), and configurable CLI options (e.g., --seg-size, --gpus). Correctness is mathematically rigorous for n up to 2^64-1.
big_check handles numbers of arbitrary digit count using GMP exact arithmetic and probabilistic Miller-Rabin (25 rounds, false positive probability < 10⁻¹⁵). OpenMP parallelism with batch-synchronised prime search prevents thread runaway for large inputs.
Full benchmark log: RESULTS.md
| Tool | Limit | Even n checked | Total time | Failures |
|---|---|---|---|---|
cpu_goldbach (CPU) |
10⁹ | 499,999,999 | 19,183.7 ms | 0 |
goldbach_gpu3 |
10⁹ | 499,999,999 | 1,867.7 ms | 0 |
goldbach |
10⁹ | 499,999,999 | 141.0 ms | 0 |
goldbach |
10¹² | 499,999,999,999 | 37,440 ms | 0 |
GPU speedup over CPU baseline: 136× total at 10⁹.
| Tool | n | Partition | Time |
|---|---|---|---|
single_check |
10¹⁸ | 14,831 + 999,999,999,999,985,169 | 1.5 s |
single_check |
10¹⁹ | 226,283 + 9,999,999,999,999,773,717 | 1.5 s |
big_check |
10¹⁰⁰⁰ | p = 26,981 | 363 ms (20 threads) |
big_check |
10¹⁰⁰⁰⁰ | p = 47,717 | 182 s (20 threads) |
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DBUILD_LEGACY=OFF
cmake --build . -j$(nproc)Dependencies: CUDA toolkit, GMP, OpenMP.
sudo apt install libgmp-dev # GMP
# CUDA toolkit: https://developer.nvidia.com/cuda-downloadsAll tools now support modular CLI argument parsing and feature dynamic hardware safety guards to prevent Out-Of-Memory (OOM) crashes. Pass -h or --help to any tool for detailed usage instructions.
All compiled executables are located in the build/bin/ directory.
./build/bin/cpu_goldbach 1000000000Useful as a correctness baseline. Expected time: ~23 s at 10⁹, ~5 min at 10¹⁰.
./build/bin/goldbach 1000000000000 --seg-size=200000000 --p-small=1000000 --batch-size=2000000No VRAM ceiling as opposed to previous architectures. Hardware VRAM is checked against the segment size at launch to ensure safe execution. Expected time: 37 s at 10¹² (1x RTX 5090).
To reproduce the 10¹² result with specific segmentation and prime bounds:
./build/bin/goldbach 1000000000000 --seg-size=200000000 --p-small=1000000 --batch-size=2000000For Multi-GPU / Cloud environments (use all available GPUs with --gpus=-1):
./build/bin/goldbach 1000000000000 --seg-size=200000000 --p-small=1000000 --batch-size=2000000 --gpus=2./build/bin/single_check 999999999999999998Uses deterministic Miller-Rabin with 12 witnesses. Returns a valid partition but not necessarily the minimal one (due to concurrent GPU thread execution).
./build/bin/big_check 12321232123212321232123212321232
./build/bin/big_check "$(python3 -c "print('1' + '0'*1000)")" # 10^1000
./build/bin/big_check "$(python3 -c "print('1' + '0'*10000)")" # 10^10000Candidate primes p are generated deterministically by a standard Sieve of Eratosthenes up to 10^7 (664,579 primes). For each p, q = n - p is computed exactly using GMP arbitrary precision arithmetic, then tested for primality using GMP's Miller-Rabin with 25 rounds (false positive probability < 4^-25 ≈ 10^-15 per test). Since p is sieve-generated, only the test on q is probabilistic.
Primes are processed in batches of 1,000 with a synchronisation barrier between batches. This prevents thread runaway: without batching, fast threads on small-p tests could race far ahead of slow threads doing expensive GMP tests, causing a large non-minimal p to be returned first. With batching, the winning p is the smallest in the first batch containing a valid partition.
Practical limit: GMP primality tests scale roughly as O(d^3) in digit count d. At 10^10000 (d = 10,001) each test takes milliseconds; beyond ~10^50000 each test takes seconds and exhaustive search becomes infeasible in practice.
The framework includes a comprehensive automated validation suite that tests core functionality, CLI robustness, and negative hardware guards (e.g., safe failure on massive VRAM requests).
cd tests
./validation.shThe codebase has been tested using the included test suite on CUDA 12.8.1, CUDA 13.0 and CUDA 13.1. Other versions may work but are not officially tested
All benchmark configurations (LIMIT, SEG_SIZE, P_SMALL, thread counts) are documented in RESULTS.md for each run.
Single number results were produced on the following fixed platform:
| Component | Specification |
|---|---|
| CPU | Intel i7‑12700H, 20 logical threads |
| GPU | NVIDIA RTX 3070 mobile, 8 GB VRAM, 448 GB/s bandwidth |
| RAM | 32 GB |
| OS | WSL2, Ubuntu 24.04 |
| CUDA Toolkit | 13.1.115 |
| CUDA Build Info | cuda_13.1.r13.1/compiler.37061995_0 (Dec 16 2025) |
| GCC | 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04.1) |
| OpenMP | 4.5 |
| GMP | 6.3.0+dfsg-2ubuntu6.1 |
All range verification results were produced on the following platform:
| Component | Specification |
|---|---|
| GPU | NVIDIA GeForce RTX 5090, 32 GB VRAM, Driver 580.95.05, CUDA 13.0 |
| CPU | Dual‑socket AMD EPYC (Engineering Sample 100‑000000897‑03), 128 logical cores; Effective: 13.6 vCPUs (cgroup quota: 1360000 / 100000) |
| Memory | 44GB |
| Environment | Ubuntu (containerized); GPU access: Full, non‑virtualized RTX 5090 |
| OS | Ubuntu 24.04 |
| CUDA Toolkit | 13.0 |
| GCC | 13.3.0 (Ubuntu 13.3.0-6ubuntu2~24.04.1) |
| OpenMP | 4.5 |
| GMP | 6.3.0+dfsg-2ubuntu6.1 |
All timings are wall‑clock time. All configurations are recorded exactly as run so results are fully reproducible.
- `src/`
- `goldbach.cu` : Multi-GPU range verifier (Flagship).
- `goldbach_gpu5a.cu` : (experimental for testing new features).
- `big_check.cpp` : Arbitrary precision checker (GMP + OpenMP).
- `single_check.cu` : 64-bit deterministic Miller-Rabin checker.
- `cpu_goldbach.cpp` : Sequential CPU baseline oracle.
- `prime_bitset.cpp` : Parallel bitset construction.
- `segmented_sieve.cpp`: CPU-based sieving logic.
- `include/`
- `prime_bitset.hpp` : Core data structures and bit-indexing logic.
- `legacy/`
- `goldbach_gpu3.cu` : Historical GPU implementation.
- `tests/`
- `validation_gpu.sh` : Automated correctness and robustness test suite for the gpu range verifier `goldbach`.
- `validation.sh` : Automated correctness and robustness test suite for the rest of the functions.
If you use this software in academic work, please cite the archived release:
Llorente-Saguer, I. (2026). GoldbachGPU (v2.0.0) [Software]. Zenodo.
https://doi.org/10.5281/zenodo.18879797
For the latest version, see the concept DOI: https://doi.org/10.5281/zenodo.18786328
If you reference the scientific description of the range verification (goldbach), please cite:
Llorente-Saguer, I. (2026). A Lock-Free, Fully GPU-Resident Architecture for the Verification of Goldbach's Conjecture.
https://arxiv.org/abs/2603.07850
If you reference the scientific description of the single big number verification method (big_check) or the previous CPU-GPU hybrid implementation (goldbach_gpu3), please cite:
Llorente-Saguer, I. (2026). GoldbachGPU: High‑performance Goldbach verification on GPUs.
https://arxiv.org/abs/2603.02621
[1] T. Oliveira e Silva, S. Herzog, S. Pardi, "Empirical verification of the even Goldbach conjecture and computation of prime gaps up to 4×10¹⁸", Mathematics of Computation, 83(288):2033–2060, 2014.
MIT
