Skip to content

PrimitiveContext/characters

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Character Energy Analysis

Every letter you write costs energy. Your fingers flex along one axis (~225°), your wrist pivots along another (~315°), and every curve, pen-lift, and stroke crossing adds to the bill. This project measures that cost.

We model each character as a physical stroke path using Hershey vector fonts, then compute a biomechanical production effort index — not Joules, but a dimensionless quantity that is monotone with plausible effort, internally consistent, and calibratable to observed handwriting kinematics.

What Gets Measured

For each of 65 characters (A-Z, a-z, punctuation) across Latin, Greek, and Cyrillic scripts:

Metric What it captures
Writing energy Two-axis motor cost (finger + wrist), direction-dependent
Ink distance Total path length, pen-down only
Curvature Integrated squared curvature (bending cost)
Pen lifts Number of stroke segments + pen-up travel distance
Distinctiveness Nearest-neighbor distance in shape space (confusability)
Perimetric complexity Ink² / enclosed area — how ornate the form is
Convex hull ratio How much of the bounding area the character fills
Topology Enclosed regions (Betti-1), endpoints, crossings

The directional cost model uses empirical biomechanics: finger flexion/extension is cheapest at ~225° (Thomassen & Teulings, 1983), wrist abduction at ~315° (Teulings & Maarse, 1984), with fingers costing ~1.4x wrist per unit distance (Van Galen & de Jong, 1995).

Key Findings

Writing systems optimize for a trade-off between cheapness and distinctiveness.

Pareto Frontier Left: the Pareto frontier — characters that are optimally cheap to write for their level of distinctiveness. Right: uppercase (cyan) vs lowercase (magenta) show different trade-off strategies.

Cross-Script Comparison Top-left: energy distributions across scripts. Top-right: cognate pairs (A, B, E, etc.) show near-identical energy in Latin vs Greek simplex fonts. Bottom-left: energy scales linearly with ink path. Bottom-right: Cyrillic complex (serif) has ~3x the perimetric complexity of simplex scripts.

Character Distributions Eight metrics across all 65 characters, sorted by rank. Each metric shows distinct distributional shape — energy and ink follow Zipf-like decay, while convex hull ratio is nearly uniform.

Transition Energy Top-left: characters with cheap exits tend to have expensive entries (and vice versa). Bottom-left: transition angles cluster around finger-axis and wrist-axis directions. Bottom-right: the most expensive bigrams by frequency × transition cost.

Topology Enclosed regions (Betti-1): most characters have 0 (open forms like C, L) or 1 (closed forms like O, D). Characters with more endpoints cost more energy. More crossings reduce confusability margin.

Frequency ≠ energy. Common letters are not systematically cheaper to write (r = +0.054). Writing systems don't appear to optimize individual character cost by usage frequency — they optimize the alphabet as a whole for the cheapness/distinctiveness trade-off.

Zipf vs Energy No correlation between letter frequency (Norvig 2013 English corpus) and writing energy. The alphabet is not a frequency-optimized code.

Usage

cd src/
python3 primitives.py                  # Core measurement functions
python3 measure.py                     # Extract strokes from Hershey fonts
python3 analyze.py                     # Energy distributions + Pareto frontier
python3 bigram_transition_analysis.py  # Between-character transition costs
python3 cross_script_analysis.py       # Latin vs Greek vs Cyrillic
python3 explore_correlations.py        # Metric correlation mining
python3 pareto_frontier.py             # Optimality analysis

Requirements

Python 3.8+, numpy, matplotlib, scipy

Related

About

Structural analysis of writing systems through character-level frequency, transition, and energy metrics

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages