Skip to content
pinokio edited this page Feb 12, 2026 · 5 revisions

Alexandria Audiobook Generator

Transform any book or novel into a fully-voiced audiobook using AI-powered script annotation and text-to-speech.

Quick Navigation

Overview

Alexandria uses a multi-stage pipeline:

  1. Upload a book (.txt or .md)
  2. LLM annotates the text into a structured script with speakers, dialogue, and TTS directions
  3. Configure voices for each character (5 voice type options + built-in LoRA presets)
  4. Generate audio using Qwen3-TTS (batched for speed)
  5. Edit and refine individual lines in the web editor
  6. Export as a single MP3 audiobook or per-speaker Audacity tracks

Voice System

Alexandria supports 5 voice types, each suited to different needs:

Type Model Best For Batched?
Custom CustomVoice Main characters (9 built-in voices with instruct control) Yes
Clone Base Characters needing a specific voice (from reference audio) Yes
LoRA Base + adapter Recurring characters needing a unique, persistent voice Yes
Voice Design VoiceDesign Minor/throwaway characters (voice from text description) No (sequential)
Saved Design Base (clone) Reusable designed voices assigned as clone references Yes

See Voice Types for detailed guidance on choosing and configuring each type.

Requirements

No external TTS server required — Alexandria includes a built-in Qwen3-TTS engine. Model weights download automatically on first use (~3.5 GB).

Installation

  1. Install Pinokio
  2. Click Download and paste: https://github.com/Finrandojin/alexandria-audiobook
  3. Click Install to set up dependencies
  4. Click Start to launch the web interface

API Test Suite

Alexandria includes an automated test script that verifies all major API endpoints:

cd app
python test_api.py              # Quick tests (~37) — no TTS/LLM needed
python test_api.py --full       # Full tests (~49) — requires running TTS + LLM
python test_api.py --url URL    # Custom server URL (default: http://127.0.0.1:4200)

Quick mode tests config round-trips, upload, scripts CRUD, voice config, chunks, status polling, voice design listing, LoRA models/datasets listing, dataset builder CRUD, and error handling — all without loading TTS models. Full mode adds script generation, audio generation, batch rendering, voice design preview, and LoRA testing. Useful for verifying the installation works or checking nothing broke after code changes.

Clone this wiki locally