A command-line interface for training, evaluating, and deploying UNET-based ink detection models on carbonized papyrus scrolls from the Vesuvius Challenge.
In 79 AD, the eruption of Mount Vesuvius carbonized a villa full of ancient papyrus scrolls, making them impossible to unroll by traditional means. The Vesuvius Challenge is an open international effort to recover the text of these scrolls using CT scanning and machine learning; specifically, semantic segmentation models that detect ink on virtually unwrapped scroll surfaces.
Leading ink detection models (UNETs, UNETRs) require large commercial GPUs and significant compute, creating a barrier for community contributors. VLAE addresses this directly: it packages trained 3D convolutional models into a reproducible CLI tool that any contributor can run, regardless of hardware background.
This project was the subject of an AP Research study investigating the effects of compression and pruning on UNET efficiency for low-resource ink detection. Key finding: filter compression reduced FLOPs by ~73.6% with a ~3 times improvement in efficiency score (F-beta / FLOPs) compared to baseline.
VLAE wraps the full model pipeline of data ingestion, model training, evaluation, and segment management into a single CLI application. Commands are triggered through main.py and routed to modular command files. The application auto-detects GPU availability (CUDA, Apple MPS, or CPU) and auto-installs dependencies on first run.
The model baked into is a 3D convolutional encoder + MLP decoder architecture that operates on 48×64×64 voxel subvolumes extracted from CT scan TIFF stacks of Vesuvius scroll fragments. Three model variants are supported: baseline, compressed (filter reduction), and pruned (L1 unstructured, 30th percentile).
Requirements: Python 3.7-3.9 (Avoid using later versions)
# Clone the repository
git clone https://github.com/rishitv26/VulcanLeagueAIEngine.git
cd VulcanLeagueAIEngine
# Run the application — dependencies install automatically on first launch
python main.pyOn first run, VLAE detects your hardware (CUDA / MPS / CPU) and installs required packages automatically via install.py and requirements.txt.
Dependencies (auto-installed): PyTorch, NumPy, Pandas, Matplotlib, and Kaggle API client.
VLAE uses the Vesuvius Challenge Ink Detection dataset, available on Kaggle. It includes three labeled training fragments, each containing:
- Surface volume TIFF stacks (CT scans of carbonized scroll segments)
- Binary mask images indicating valid regions
- Ground-truth ink labels from infrared scanning
When using the train command, the data will autoinstall. However, if need be, it is possible to manually install the data from the Kaggle dataset and put it in the respective directory.
Run python main.py twice within the desired environment.
The first time, the program will successfully install all dependencies and quit The second time, you will be greeted with a commandline interface as follows:
Loading...
...
Welcome to the VLAE (Vulcan League AI Engine) <version number>
type 'help' to see the list of commands.
type 'manual' for a basic tutorial on what to do.
[AI] Condition: 'baseline' | Filters: [16, 32, 64] | Device: <device>
[AI] FLOPs per forward pass: <FLOPs for your system>
>>>
The following commands will be accessible from there:
| Command | Description |
|---|---|
clear/cls |
Clear the console output |
exit |
Stop the VLAE routine |
manual |
Basic instructions for running VLAE |
change-setting <setting> <new value> |
Change a setting variable manually |
get-setting <setting> |
Get the value of a given setting |
get-all-settings |
Get all changeable settings |
add-segment <dir> |
Copy a segment into the test subfolder for ink detection |
rm-segment <name> |
Delete a segment from the test subfolder |
train |
Train the model using training_data (must be comma-separated values of 1, 2, or 3; no repeats) |
eval |
Run the model on data from the test subfolder |
All three variants share the same base architecture: a 3D convolutional encoder feeding into an MLP decoder.
Encoder: Three sequential Conv3D blocks (kernel size 3, stride 2, padding 1), each followed by Batch Normalization and ReLU activation. An adaptive average pooling layer collapses spatial dimensions to a single feature vector.
Decoder: Two fully connected hidden layers (128 units each, ReLU), followed by a single-neuron output layer producing a scalar ink probability in [0.0, 1.0].
Training objective: BCEWithLogitsLoss. Optimizer: SGD + OneCycleLR scheduler. Metric: F-beta score (β=0.5).
| Variant | Filter sizes | Notes |
|---|---|---|
| Baseline | [16, 32, 64] | Control model |
| Compressed | [8, 16, 32] | ~73.6% FLOPs reduction; ~3 times efficiency score improvement |
| Pruned | [16, 32, 64] + L1 pruning | 30th percentile weight removal + 10K fine-tune steps |
Each training and evaluation run produces:
training_log.csv— Per-step log: condition label, fragment ID, batch loss, step index, batch F-beta score, elapsed timerun_summary.csv— Per-run summary: total steps, training time, FLOPs per iteration, peak RAM, batch size, learning rate, timestampoutput_fragment_X.png— Binary ink prediction image for evaluated fragment
This codebase was used in an AP Research study examining the effect of segmentation architecture optimizations (compression vs. pruning) on ink detection efficiency for the Vesuvius Challenge.
Key results:
- Compressed model achieved ~73.6% reduction in training FLOPs and ~71% reduction in evaluation FLOPs vs. baseline
- Compressed model's efficiency score (F-beta / FLOPs) was ~3 times higher than baseline on both test fragments
- Pruning at 30th percentile increased total training FLOPs by ~17% with no measurable efficiency gain, suggesting conservative pruning serves weight regularization rather than computational reduction
Conclusion: Filter compression is the preferred method for making ink detection accessible to low-resource community contributors.
Note that although the compressed model had a worse F-beta, the calculated effeciency score (F-beta / FLOPs) was 3 times better than baseline and pruned.