-
Notifications
You must be signed in to change notification settings - Fork 53
Home
Transform any book or novel into a fully-voiced audiobook using AI-powered script annotation and text-to-speech.
- Voice Types — Custom, Clone, LoRA, Voice Design — when to use each
- Voice Reference — Vocal direction lexicon and VoiceDesign prompt engineering findings
- Training Guide — Create custom voice identities with LoRA fine-tuning
- Dataset Builder — Build training datasets interactively with per-sample preview
- Batch Generation — How batching works, performance tuning, benchmarks
- Script Generation — LLM annotation pipeline, prompts, review pass
- Editor & Export — Chunk editing, render modes, Audacity export
- API Reference — REST endpoints with curl, Python, and JavaScript examples
- Troubleshooting — Common issues and solutions
Alexandria uses a multi-stage pipeline:
- Upload a book (.txt or .md)
- LLM annotates the text into a structured script with speakers, dialogue, and TTS directions
- Configure voices for each character (5 voice type options + built-in LoRA presets)
- Generate audio using Qwen3-TTS (batched for speed)
- Edit and refine individual lines in the web editor
- Export as a single MP3 audiobook or per-speaker Audacity tracks
Alexandria supports 5 voice types, each suited to different needs:
| Type | Model | Best For | Batched? |
|---|---|---|---|
| Custom | CustomVoice | Main characters (9 built-in voices with instruct control) | Yes |
| Clone | Base | Characters needing a specific voice (from reference audio) | Yes |
| LoRA | Base + adapter | Recurring characters needing a unique, persistent voice | Yes |
| Voice Design | VoiceDesign | Minor/throwaway characters (voice from text description) | No (sequential) |
| Saved Design | Base (clone) | Reusable designed voices assigned as clone references | Yes |
See Voice Types for detailed guidance on choosing and configuring each type.
- Pinokio
- An LLM server: LM Studio, Ollama, OpenAI API, or any OpenAI-compatible API
- GPU with 16+ GB VRAM recommended (NVIDIA or AMD)
No external TTS server required — Alexandria includes a built-in Qwen3-TTS engine. Model weights download automatically on first use (~3.5 GB).
- Install Pinokio
- Click Download and paste:
https://github.com/Finrandojin/alexandria-audiobook - Click Install to set up dependencies
- Click Start to launch the web interface
Alexandria includes an automated test script that verifies all major API endpoints:
cd app
python test_api.py # Quick tests (~37) — no TTS/LLM needed
python test_api.py --full # Full tests (~49) — requires running TTS + LLM
python test_api.py --url URL # Custom server URL (default: http://127.0.0.1:4200)Quick mode tests config round-trips, upload, scripts CRUD, voice config, chunks, status polling, voice design listing, LoRA models/datasets listing, dataset builder CRUD, and error handling — all without loading TTS models. Full mode adds script generation, audio generation, batch rendering, voice design preview, and LoRA testing. Useful for verifying the installation works or checking nothing broke after code changes.