A research project investigating octonions as a computational substrate for machine learning. The primary result is a self-organizing octonionic trie that classifies data without gradient descent, using only algebraic operations.
There are exactly four number systems that support addition, subtraction, multiplication, division, and norm preservation. These are the normed division algebras, each constructed by doubling the previous one:
| Algebra | Dimension | Property lost |
|---|---|---|
| Real numbers (R) | 1 | None |
| Complex numbers (C) | 2 | Ordering |
| Quaternions (H) | 4 | Commutativity |
| Octonions (O) | 8 | Associativity |
Hurwitz's theorem (1898) proves that no further normed division algebra exists. Beyond octonions, the Cayley-Dickson construction produces algebras with zero divisors, which destroy invertibility and render the algebra unsuitable for reversible computation.
Octonions appear throughout theoretical physics (string theory, M-theory, exceptional Lie groups). This project investigates their potential as a computational substrate for machine learning.
The project produced two companion theses:
docs/thesis/oct-neural-nets.pdf-- Gradient-trained octonionic networks: algebra, calculus, fair baselines, and optimization landscape analysis.docs/thesis/oct-trie.pdf-- Self-organizing octonionic tries: a zero-gradient classification architecture using associator-based novelty detection and Fano plane subalgebra routing.
-
Density: Octonions encode more geometric structure per parameter than smaller algebras. Their automorphism group (G_2, 14-dimensional) is larger than the algebra itself (8-dimensional), a property unique among division algebras.
-
Reversibility: Every non-zero octonion has a unique inverse. The trie exploits this for consistency verification via algebraic inversion, exact to machine precision at depth 200+.
-
Algebraic economy: A single 8-dimensional non-associative algebra provides five functionally independent components -- routing, novelty detection, content update, consistency check, and health monitoring -- through two properties of its multiplication structure (alternativity and Fano plane geometry).
The primary research contribution. A self-organizing hierarchical memory where octonionic algebra replaces gradient-based learning entirely:
| Component | Conventional systems | Octonionic trie |
|---|---|---|
| Routing | Learned attention / gating | Subalgebra decomposition (Fano plane) |
| Novelty detection | Engineered thresholds | Associator norm |
| Content update | Learned write / Hebbian | Octonionic composition |
| Consistency check | None | Algebraic inversion |
| Health monitoring | Separate metrics | Associator norms + composition error |
| Experiment | Result | What it shows |
|---|---|---|
| Subalgebra routing | 90%+ within-class consistency | Routing is discriminative |
| Associator as novelty | 5x spike ratio, zero false negatives | Novelty detection works |
| Composition depth | Exact inversion to machine precision at depth 200 | Local inversion is reliable |
| 7-category stability-plasticity | 97.7% accuracy, 0.0% forgetting | Trie learns without forgetting |
| MNIST (CNN encoder) | 95.2% (vs. 98.2% kNN on same features) | Competitive without gradients |
| MNIST (PCA-only) | 76.5% (vs. 88.8% kNN) | Works without any neural network |
- Egan's theorem: Mean associator norm for uniform unit octonion triples is 147456/(42875pi) ~ 1.095, establishing the baseline against which all threshold choices are measured. Validated by Monte Carlo with 95% CI.
- G_2 invariance: Associator norms are invariant under the automorphism group of the octonions, constraining the functional form of adaptive threshold policies.
- Element proximity bound: Elements clustered within epsilon of a common point have associator norms O(epsilon^2), from alternativity alone -- no subalgebra alignment needed. This provides quadratic within-class suppression.
- Subalgebra proximity bound: Elements spread across a quaternionic subalgebra have associator norms O(epsilon). The Fano plane geometry provides between-class separation bounded away from zero.
Two mechanisms -- quadratic floor suppression from alternativity, geometric ceiling elevation from the Fano plane -- emerge from the same algebraic object.
- CNN encoder dependency: The 95.2% result uses a separately-trained CNN encoder. The PCA-only result (76.5%) shows the trie works without it, but at lower accuracy.
- Quaternionic trie ablation not done: The strongest ablation -- a quaternionic trie lacking the associator signal -- has not been run.
- Scalability open: 26,042 nodes for 60K MNIST samples. Whether consolidation bounds growth in practice is an open question.
The trie is built on a validated algebraic foundation (Phases 1-5, complete):
| Phase | What it validates | Status |
|---|---|---|
| 1. Octonionic Algebra | Moufang identities, norm preservation, alternativity, associator properties | Complete |
| 2. GHR Calculus | Octonionic gradients match finite-difference to <1e-5 | Complete |
| 3. Baselines | Fair R/C/H/O comparison networks with matched parameters (<1% deviation) | Complete |
| 4. Numerical Stability | Precision vs depth, condition numbers, StabilizingNorm mitigation | Complete |
| 5. Optimization Landscape | Go/no-go gate for gradient-trained octonionic networks | Complete (gate passed) |
Every gradient-trained experiment compares 4+ algebras with matched total parameter counts:
| Algebra | Width multiplier | Role |
|---|---|---|
| Real (R) | 8x | Standard baseline |
| Complex (C) | 4x | Published quaternionic/complex ML comparison |
| Quaternion (H) | 2x | Closest associative relative |
| Octonion (O) | 1x | Primary subject |
| PHM-8 | Matched | Kronecker-factored structure (Zhang et al. 2021) |
| R8-Dense | Matched | Unconstrained 8x8 real mixing control |
A PyTorch-native library implementing the full stack from algebra to experiment infrastructure:
| Module | Description |
|---|---|
| Core algebra | Multiplication via Fano plane structure constants, conjugation, norm, inverse, associator. Cayley-Dickson cross-check. R/C/H/O tower types. Batch operations, exp/log. |
trie.py |
Self-organizing octonionic trie with 7 pluggable threshold policies (Global, EMA, MeanStd, Depth, AlgebraicPurity, MetaTrie, Hybrid). Subalgebra routing, associator-based novelty detection, rumination consistency check, consolidation. |
calculus/ |
GHR Wirtinger derivatives for octonionic backpropagation. Analytic 8x8 Jacobians. Parenthesization-aware chain rule. torch.autograd.Function implementations. |
baselines/ |
Fair comparison networks: parameter-matched R/C/H/O. Algebra-specific batch norm with Cholesky whitening. MLP, Conv2D, Recurrent topologies. Training with gradient statistics, AMP, TensorBoard. |
landscape/ |
Hessian eigenspectrum, Bill & Cox curvature, gradient variance, go/no-go gate. |
tasks/ |
Synthetic task generators with known optima. |
839 tests covering algebra, calculus, baselines, stability, trie, threshold policies, proximity bounds, and thesis theorem validation (pytest + Hypothesis property-based testing).
All computation runs in a Docker container with ROCm PyTorch (AMD GPU). The host machine is used only for git and file editing.
# Build the container (first time)
docker compose build
# Install dependencies
docker compose run --rm dev uv sync
# Run tests
make test
# List available experiment scripts
make list-scripts
# Run a specific experiment (example)
make run-run_trie_mnist
make run-run_landscape ARGS="--smoke"Requirements: Docker, an AMD GPU with ROCm support (developed on RX 7900 XTX, 24GB VRAM). NVIDIA GPUs would require swapping the base container image.
src/octonion/ # Main package
_octonion.py # Octonion class, associator
_multiplication.py # Fano plane multiplication engine
_fano.py # Fano plane structure and automorphisms
_tower.py # Real, Complex, Quaternion tower types
_operations.py # exp, log, commutator, cross product
_linear.py # OctonionLinear nn.Module
_random.py # Random octonion generation (uniform on S^7)
trie.py # Octonionic trie + threshold policies
calculus/ # GHR derivatives and autograd
baselines/ # Comparison networks and training
landscape/ # Optimization analysis toolkit
tasks/ # Synthetic task generators
tests/ # 839 tests (pytest + Hypothesis)
scripts/ # Experiment runners and analysis
docs/thesis/ # Two thesis documents + build system
.planning/ # Research planning and phase tracking
- Baez, J. C. (2002). "The Octonions." Bulletin of the AMS, 39(2), 145-205.
- Bill, J. & Cox, B. (2024). "Exploring Quaternion Neural Network Loss Surfaces." Advances in Applied Clifford Algebras, 34.
- Egan, G. (2024). "Mean associator norm on S^7." Personal communication.
- Gaudet, C. & Maida, A. (2018). "Deep Quaternion Networks." IJCNN.
- Parcollet, T., et al. (2019). "Quaternion Recurrent Neural Networks." ICLR.
- Trabelsi, C., et al. (2018). "Deep Complex Networks." ICLR.
- Wu, J., et al. (2020). "Deep Octonion Networks." Neurocomputing, 397, 179-191.
- Zhang, A., et al. (2021). "Beyond Fully-Connected Layers with Quaternions: Parameterization of Hypercomplex Multiplications with 1/n Parameters." ICLR.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
This means you can use, modify, and distribute this code, but any derivative work -- including network services that use it -- must also be released under the AGPL-3.0 with full source code. See LICENSE for the full text.