Clair3

Symphonizing pileup and full-alignment for deep-learning-based long-read variant calling

Contact: Ruibang Luo, Zhenxian Zheng, Xian Yu

Email: rbluo@cs.hku.hk · zxzheng@cs.hku.hk · yuxian@connect.hku.hk

Introduction

Clair3 is a germline small-variant caller for long-read sequencing. It combines two complementary models to balance speed and accuracy:

Pileup calling — fast, handles the majority of variant candidates from summarized alignment statistics.
Full-alignment calling — computationally intensive, resolves uncertain candidates from haplotype-resolved full alignments.

Clair3 is the 3rd generation of Clair (2nd) and Clairvoyante (1st).

Looking for a different variant caller?

Use case	Tool
Germline on long-read RNA-seq	Clair3-RNA
Somatic, paired tumor/normal	ClairS
Somatic, tumor-only	ClairS-TO

Agent Skill

Clair-skills is a plug-in for agentic AI coding assistants (Claude Code, Cursor, Codex, …) that covers the entire Clair suite. It helps the agent pick the right tool and model, generate ready-to-run commands, and analyze results.

Latest Updates

v2.0.0 — Feb 9, 2026 (Major release)

A preprint describing the performance of Clair3 v2 is available on bioRxiv.

PyTorch migration. The deep-learning backend moved from TensorFlow to PyTorch. v1 TensorFlow models are not compatible with v2 (including the TF models ONT provides via Rerio). Use the Converted Rerio Clair3 Models (PyTorch), or convert your own with the Model Migration Guide. Pre-trained PyTorch models: download here.
Signal-aware variant calling for ONT. Pass --enable_dwell_time on BAMs with Dorado mv tags (requires --emit-moves). See Dwelling Time Feature.
New Python runner. run_clair3.sh was reconstructed as run_clair3.py; both remain usable.
Checkpoint format. TF .index/.data → PyTorch .pt.

v1.2.0 — Aug 1, 2025

Native GPU support on Linux and Apple Silicon. Clair3 on GPU runs ~5× faster than CPU. See the GPU Quick Start.

v1.1.2 — Jul 10, 2025

Boundary check for an insertion immediately followed by soft-clipping (#394, @dpryan79).
Parallel-job exit-code checking; pipeline now exits immediately on any job failure (#392, @SamStudio8).

v1.1.1 — May 19, 2025

Fixed the malformed VCF header on AWS (#380).
Added an R10.4.1 model fine-tuned on 12 bacterial genomes (notes, @wshropshire).

Earlier versions (click to expand)

v1.1.0 — Apr 8, 2025. Removed parallel version checking (#377).

v1.0.11 — Mar 19, 2025. Added --enable_variant_calling_at_sequence_head_and_tail to call variants in the first/last 16 bp of a sequence (use with caution — less reliable alignments and less context; #257). Added --output_all_contigs_in_gvcf_header (#371). Added postprocessing AddPairEndAlleleDepth (PEAD tag, Bin Guan, NEI). Fixed AF format in GVCF output (#365). Added a split-into-haplotypes calling workflow. set -o pipefail in run_clair3.sh (#368). Clarified parameter docs (#369).

v1.0.10 — Jul 28, 2024. Fixed an out-of-range bug in non-human GVCF output (#317). Faster amplicon calling via --chunk_num=-1 (#306). LongPhase bumped to 1.7.3 (#321).

v1.0.9 — May 15, 2024. Fixed VCF header (#305); updated DP FORMAT description.

v1.0.8 — Apr 29, 2024. Fixed occasional quality-score differences between VCF and GVCF output. LongPhase bumped to 1.7.

v1.0.7 — Apr 7, 2024. Memory guards for full-alignment C implementation (#286). Raised max mpileup coverage to 2^20 (#292). LongPhase bumped to 1.6.

v1.0.6 — Mar 15, 2024. Stack-overflow fix at very high coverage (#282). Reference caching for CRAM (#278). Fixed RefCall outputs when FA model calls no variant (#271). Fixed min-coverage filtering (#262). --min_snp_af / --min_indel_af default to 0.0 when --vcf_fn is set (#261).

v1.0.5 — Dec 20, 2023. Fixed multi-allelic AF at very high coverage (#241). --base_err and --gq_bin_size to reduce excess ./. in GVCF (#220).

v1.0.4 — Jul 11, 2023. Command line and reference source now in VCF header. Fixed AF for 1/2 genotypes. Added AD tag.

v1.0.3 — Jun 20, 2023. Colon : allowed in reference sequence names (#203).

v1.0.2 — May 22, 2023. Added PacBio HiFi Revio model. Fixed halt on too few variant candidates (#198).

v1.0.1 — Apr 24, 2023. WhatsHap bumped to 1.7 (~15% faster haplotagging, #193). Fixed PL when ALT is N (#191).

v1.0.0 — Mar 6, 2023. Clair3 version in VCF header (#141). NumPy int fix (#165). IUPAC → N by default, keep with --keep_iupac_bases (#153). Added --use_{longphase,whatshap}_for_intermediate_phasing / --use_{longphase,whatshap}_for_final_output_phasing / --use_whatshap_for_final_output_haplotagging (#164). Fixed Docker shell under host user mode (#175).

v0.1-r12 — Aug 19, 2022. CRAM input (#117). Python 3.9, TensorFlow 2.8, Samtools 1.15.1, WhatsHap 1.4. DP now shows raw coverage for pileup calls (#128). Illumina representation-unification fix (#110). LongPhase 1.3.

v0.1-r11 minor 2 — Apr 16, 2022. Fixed missing non-variant GVCF positions at chunk boundaries. Reduced GVCF memory footprint (#88).

v0.1-r11 — Apr 4, 2022. ~2.5× faster on ONT Q20 data with pileup and full-alignment feature generation in C. LongPhase as a phasing option (--longphase_for_phasing). --min_coverage, --min_mq, --min_contig_size. CSI index support (#90). See Notes on r11.

v0.1-r10 — Jan 13, 2022. Added the Guppy5 model r941_prom_sup_g5014 (benchmarks); applicable to sup, hac, fast reads. The older r941_prom_sup_g506 was obsoleted. Added --var_pct_phasing.

v0.1-r9 — Dec 1, 2021. --enable_long_indel for indel calls >50 bp (benchmarks, #64).

v0.1-r8 — Nov 11, 2021. --enable_phasing to emit WhatsHap-phased VCF (#63). Fixed unexpected program termination on success.

v0.1-r7 — Oct 18, 2021. ONT var_pct_full raised 0.3 → 0.7 (+~0.2% indel F1). Fall-through to next-likely variant on low coverage (#53). Streamlined training. mini_epochs in Train.py (#60). GVCF intermediates now lz4-compressed (5× smaller). --remove_intermediate_dir (#48). ONT models renamed per Medaka convention. Training-data leakage fixed (#57).

ONT-provided models — Sep 23, 2021. ONT also provides chemistry-/basecaller-specific Clair3 models via Rerio.

v0.1-r6 — Sep 4, 2021. Reduced SortVcf memory (#45). Lower ulimit -n requirement (#47). Clair3-Illumina in bioconda (#42).

v0.1-r5 — Jul 19, 2021. Training-data generator fix to avoid Tensorflow segfaults. Simplified Dockerfile. Fixed ALT output for reference calls. Fixed multi-allelic AF ([ACGT]Del). AD tag in GVCF. --call_snp_only (#40). Pileup/FA validity checks (#32, #38).

v0.1-r4 — Jun 28, 2021. Bioconda install. ONT Guppy2 model (benchmarks — must be used on Guppy2-or-earlier data). Colab notebooks. Fix on too few variant candidates (#28).

v0.1-r3 — Jun 9, 2021. ulimit -u check with auto-retry on failed jobs (#20, #23, #24). ONT Guppy5 model (benchmarks).

v0.1-r2 — May 23, 2021. BED out-of-range fix (#12). Both .bam.bai and .bai accepted (#10). Boundary and package version checks.

v0.1-r1 — May 18, 2021. Relative paths in Conda (#5). taskset CPU-core visibility fix and Singularity image (#6).

v0.1 — May 17, 2021. Initial release.

Installation

Pick the right install method for your hardware:

CPU → Docker (Option 1), Singularity (Option 2), or Bioconda (Option 3).

NVIDIA GPU (Linux) → Docker GPU (Option 1) or Singularity GPU (Option 2); fall back to Step-by-step (Option 4) if unsupported.

Apple Silicon (M1/M2/M3/M4) → Step-by-step (Option 4).

See the GPU Quick Start for tuned settings.

Option 1. Docker

Pre-built image: hkubal/clair3.

Use absolute paths for INPUT_DIR and OUTPUT_DIR.

CPU

INPUT_DIR="[YOUR_INPUT_FOLDER]"        # e.g. /home/user1/input  (absolute path)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]"      # e.g. /home/user1/output (absolute path)
THREADS="[MAXIMUM_THREADS]"            # e.g. 8
MODEL_NAME="[YOUR_MODEL_NAME]"         # e.g. r1041_e82_400bps_sup_v500

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:v2.0.0 \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \                       ## {ont,hifi,ilmn}
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR}

python3 /opt/bin/run_clair3.py can replace /opt/bin/run_clair3.sh in the command above.

GPU (NVIDIA CUDA on Linux)

Image: hkubal/clair3:v2.0.0_gpu (built on CUDA 12.1).

Requirements

NVIDIA driver ≥ 530.30.02.
NVIDIA Container Toolkit installed on the host.

docker run -it --gpus all \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:v2.0.0_gpu \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \                       ## {ont,hifi,ilmn}
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR} \
    --use_gpu

Notes

Select specific GPUs with --gpus '"device=0,1"' (Docker) and --device=cuda:0,1 (Clair3).
If the image does not work on your setup (unsupported driver/CUDA, no NVIDIA Container Toolkit, Apple Silicon, etc.), fall back to Step-by-step (Option 4).

Option 2. Singularity

Use absolute paths for INPUT_DIR and OUTPUT_DIR.

CPU

conda config --add channels defaults
conda create -n singularity-env -c conda-forge singularity -y
conda activate singularity-env

singularity pull docker://hkubal/clair3:v2.0.0

singularity exec \
  -B ${INPUT_DIR},${OUTPUT_DIR} \
  clair3_v2.0.0.sif \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \                       ## {ont,hifi,ilmn}
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR}

GPU (NVIDIA CUDA on Linux)

Requirements

NVIDIA driver ≥ 530.30.02.
Singularity (or Apptainer) with --nv support.

singularity pull docker://hkubal/clair3:v2.0.0_gpu

singularity exec --nv --cleanenv --env TMPDIR=/tmp \
  -B ${INPUT_DIR},${OUTPUT_DIR} \
  clair3_v2.0.0_gpu.sif \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \                       ## {ont,hifi,ilmn}
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR} \
    --use_gpu

Notes

--nv injects the host NVIDIA driver and libraries into the container (equivalent of Docker's --gpus all); no NVIDIA Container Toolkit needed.
--cleanenv --env TMPDIR=/tmp avoids parallel failing when the host TMPDIR points to a path not visible inside the container.
If the image does not work on your setup, fall back to Step-by-step (Option 4).

Option 3. Bioconda

Clair3 is available on Bioconda. The recipe bundles PyPy, samtools, parallel, whatshap, LongPhase, and the pre-trained models under ${CONDA_PREFIX}/bin/models/. See bioconda-recipes#64260 for the v2 (PyTorch) recipe.

mamba create -n clair3 -c conda-forge -c bioconda -y clair3
mamba activate clair3

MODEL_NAME="[YOUR_MODEL_NAME]"         # e.g. r1041_e82_400bps_sup_v500

run_clair3.sh \
  --bam_fn=input.bam \
  --ref_fn=ref.fa \
  --threads=${THREADS} \
  --platform=ont \                 ## {ont,hifi,ilmn}
  --model_path=${CONDA_PREFIX}/bin/models/${MODEL_NAME} \
  --output=${OUTPUT_DIR}

Note. The Bioconda package ships a CPU-only PyTorch build. For NVIDIA GPU or Apple Silicon, use Step-by-step (Option 4).

Option 4. Step-by-step (Conda)

Install Mamba or Conda from miniforge (Mamba is much faster).

Step 1 — Create and activate the environment

mamba create -n clair3_v2 -c conda-forge -c bioconda -y \
  python=3.11 samtools whatshap parallel \
  zstd xz zlib bzip2 automake make gcc gxx curl pigz
mamba activate clair3_v2
pip install uv

Step 2 — Install PyTorch

Pick the right build for your system from the PyTorch website.

# Example: NVIDIA CUDA 13.0
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130

# Or: CPU only
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

Step 3 — Clone Clair3

cd ${HOME}
git clone https://github.com/HKU-BAL/Clair3.git
cd Clair3
export CLAIR3_PATH=$(pwd)

Step 4 — Install Python deps and build C sources

uv pip install numpy h5py hdf5plugin numexpr tqdm cffi torchmetrics
make PREFIX=${CONDA_PREFIX}

make compiles samtools/htslib, LongPhase, and the Clair3 C shared library (libclair3.so) used for fast pileup and full-alignment tensor generation.

Step 5 — Install PyPy3.11 (speeds up preprocessing)

wget https://downloads.python.org/pypy/pypy3.11-v7.3.20-linux64.tar.bz2
tar -xjf pypy3.11-v7.3.20-linux64.tar.bz2 && rm pypy3.11-v7.3.20-linux64.tar.bz2

ln -s $(pwd)/pypy3.11-v7.3.20-linux64/bin/pypy3 ${CONDA_PREFIX}/bin/pypy3
ln -s $(pwd)/pypy3.11-v7.3.20-linux64/bin/pypy3 ${CONDA_PREFIX}/bin/pypy

pypy3 -m ensurepip
pypy3 -m pip install mpmath==1.2.1

Step 6 — (Optional) Download pre-trained models

cd ${CLAIR3_PATH}
mkdir -p models
wget -r -np -nH --cut-dirs=2 -R "index.html*" -P ./models \
  https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/

Individual models can also be grabbed from the model index.

Step 7 — Run Clair3

MODEL_NAME=r1041_e82_400bps_sup_v500
${CLAIR3_PATH}/run_clair3.sh \
  --bam_fn=input.bam \
  --ref_fn=ref.fa \
  --threads=${THREADS} \
  --platform=ont \
  --model_path=${CLAIR3_PATH}/models/${MODEL_NAME} \
  --output=${OUTPUT_DIR}

python3 ${CLAIR3_PATH}/run_clair3.py accepts the same arguments and can be used interchangeably.

Pre-trained Models

Important: v1 TensorFlow models are not compatible with Clair3 v2 (including the TF models ONT provides via Rerio). Convert your own with the Model Migration Guide, or use the pre-converted models below.

Download:

HKU-provided: https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/
Converted ONT Rerio: https://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/

Bundled locations: /opt/models/ (Docker) · ${CONDA_PREFIX}/bin/models/ (Bioconda).

HKU-provided models

Listed at https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/.

Model	Platform	`--platform`	Training samples / Notes	Bioconda	Docker
`r1041_e82_400bps_hac_v520_with_mv` (latest)	ONT R10.4.1 E8.2 (5 kHz), HAC	`ont`	HG001,2,5 (chr20 excluded) — signal-aware, use `--enable_dwell_time`		✓
`r1041_e82_400bps_sup_v520_with_mv` (latest)	ONT R10.4.1 E8.2 (5 kHz), SUP	`ont`	HG001,2,5 (chr20 excluded) — signal-aware, use `--enable_dwell_time`		✓
`r1041_e82_400bps_sup_v430_bacteria_finetuned`	ONT R10.4.1	`ont`	Fine-tuned on 12 bacterial genomes		✓
`r941_prom_sup_g5014`	ONT R9.4.1, Guppy5 SUP	`ont`	HG002,4,5; also usable on HAC reads (benchmarks)	✓	✓
`r941_prom_hac_g360+g422`	ONT R9.4.1, Guppy3/4 HAC	`ont`	HG001,2,4,5
`hifi_revio`	PacBio HiFi Revio	`hifi`	HG002,4	✓	✓
`hifi_sequel2`	PacBio HiFi Sequel II	`hifi`	HG001,2,4,5	✓	✓
`ilmn`	Illumina	`ilmn`	HG001,2,4,5	✓	✓

Recommendation for modern ONT R10.4.1 data: when your BAM has Dorado mv tags, use the dwell-time model (..._with_mv) for the best accuracy; otherwise, use an ONT-trained model below.

ONT-provided models (bundled)

ONT's models are fine-tuned to specific chemistries / basecallers and typically outperform the HKU baselines — we recommend using them for best results. Official PyTorch distributions from ONT are in progress; in the meantime, use the converted Rerio models below.

The following ONT-trained models are bundled with Clair3 Docker / Bioconda since v1.1.1:

Model	Chemistry	Dorado model	Bioconda	Docker
`r1041_e82_400bps_sup_v500`	R10.4.1 E8.2 (5 kHz)	v5.0.0 SUP	✓	✓
`r1041_e82_400bps_hac_v500`	R10.4.1 E8.2 (5 kHz)	v5.0.0 HAC		✓
`r1041_e82_400bps_sup_v410`	R10.4.1 E8.2 (4 kHz)	v4.1.0 SUP	✓	✓
`r1041_e82_400bps_hac_v410`	R10.4.1 E8.2 (4 kHz)	v4.1.0 HAC		✓

ONT has released newer Dorado v5.2.0 models (r1041_e82_400bps_sup_v520 / hac_v520). They are not yet bundled in Docker / Bioconda — download them from the Converted Rerio models section below.

Converted Rerio models

The full ONT Rerio catalog converted to PyTorch for Clair3 v2 is available at https://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/. A selection of recent R10.4.1 E8.2 (5 kHz) models is listed below.

Model	Chemistry	Dorado model
`r1041_e82_400bps_sup_v520` (latest)	R10.4.1 E8.2 (5 kHz)	v5.2.0 SUP
`r1041_e82_400bps_hac_v520` (latest)	R10.4.1 E8.2 (5 kHz)	v5.2.0 HAC
`r1041_e82_400bps_sup_v500`	R10.4.1 E8.2 (5 kHz)	v5.0.0 SUP
`r1041_e82_400bps_hac_v500`	R10.4.1 E8.2 (5 kHz)	v5.0.0 HAC
`r1041_e82_400bps_sup_v430`	R10.4.1 E8.2 (5 kHz)	v4.3.0 SUP
`r1041_e82_400bps_hac_v430`	R10.4.1 E8.2 (5 kHz)	v4.3.0 HAC
`r1041_e82_400bps_sup_v410`	R10.4.1 E8.2 (5 kHz)	v4.1.0 SUP
`r1041_e82_400bps_hac_v410`	R10.4.1 E8.2 (5 kHz)	v4.1.0 HAC

For other chemistries and basecaller versions (R10.4.1 E8.2 260 bps, R10.4 E8.1, earlier Guppy g6xx / g5015, v4.0.0 / v4.2.0), browse the full model directory and pick the one matching your chemistry and basecaller (Dorado / Guppy) version.

Quick Demo

ONT with dwelling time — ONT Dwelling Time Quick Demo
Oxford Nanopore (ONT) — ONT Quick Demo
PacBio HiFi — PacBio HiFi Quick Demo
Illumina NGS — Illumina Quick Demo

Usage

General usage

Caution: Use =value for all parameters, e.g. --bed_fn=fn.bed (not --bed_fn fn.bed).

./run_clair3.sh \
  --bam_fn=${BAM} \
  --ref_fn=${REF} \
  --threads=${THREADS} \
  --platform=ont \                 ## {ont,hifi,ilmn}
  --model_path=${MODEL_PREFIX} \
  --output=${OUTPUT_DIR} \
  --include_all_ctgs               ## required for non-human species

Outputs:

File	Description
`${OUTPUT_DIR}/pileup.vcf.gz`	Pileup model calls
`${OUTPUT_DIR}/full_alignment.vcf.gz`	Full-alignment model calls
`${OUTPUT_DIR}/merge_output.vcf.gz`	Final Clair3 output

By default, variants are called on chr{1..22,X,Y} and {1..22,X,Y}. Override with --include_all_ctgs, --ctg_name, or --bed_fn.

python3 run_clair3.py is interchangeable with ./run_clair3.sh.

Options

Required

-b, --bam_fn=FILE         Indexed BAM input.
-f, --ref_fn=FILE         Indexed FASTA reference.
-m, --model_path=STR      Folder containing pileup.pt and full_alignment.pt.
-t, --threads=INT         Max threads. Each chunk uses 4; ceil(threads/4)*3 chunks run in parallel.
-p, --platform=STR        {ont,hifi,ilmn}
-o, --output=PATH         VCF/GVCF output directory.

Common options

    --bed_fn=FILE                     Call variants only in these BED regions.
    --vcf_fn=FILE                     Candidate sites VCF; only call at these sites.
    --ctg_name=STR                    Sequence(s) to process.
    --sample_name=STR                 Sample name in the output VCF.
    --qual=INT                        Variants with QUAL > $qual are PASS, else LowQual.
    --chunk_size=INT                  Chunk size for parallel processing. Default: 5000000.
    --pileup_only                     Pileup model only. Default: disable.
    --print_ref_calls                 Include 0/0 calls in the VCF. Default: disable.
    --include_all_ctgs                Call on all contigs. Default: chr{1..22,X,Y}.
    --gvcf                            Emit GVCF. Default: disable.
    --remove_intermediate_dir         Drop intermediate files when no longer needed.

GPU / signal-aware

    --use_gpu                         Enable GPU-accelerated calling.
    --device=STR                      GPU device(s), e.g. 'cuda:0' or 'cuda:0,1'. Default: all visible GPUs.
    --enable_dwell_time               Signal-aware calling via Dorado mv tags (ONT only; C impl required).

Phasing

    --use_whatshap_for_intermediate_phasing      Default: enable.
    --use_longphase_for_intermediate_phasing     Default: disable.
    --use_whatshap_for_final_output_phasing      Default: disable.
    --use_longphase_for_final_output_phasing     Default: disable.
    --use_whatshap_for_final_output_haplotagging Default: disable.
    --enable_phasing                             Alias of --use_whatshap_for_final_output_phasing (legacy).
    --longphase_for_phasing                      Alias of --use_longphase_for_intermediate_phasing (legacy).

External binaries

    --samtools=STR     samtools >= 1.10
    --python=STR       python3 >= 3.6
    --pypy=STR         pypy3 >= 3.6
    --parallel=STR     parallel >= 20191122
    --whatshap=STR     whatshap >= 1.0
    --longphase=STR    longphase >= 1.0

Experimental / advanced

    --snp_min_af=FLOAT        Min SNP AF. Default: ont/hifi/ilmn = 0.08.
    --indel_min_af=FLOAT      Min indel AF. Default: ont=0.15, hifi/ilmn=0.08.
    --var_pct_full=FLOAT      Pct of low-quality 0/1 and 1/1 pileup calls rerun in full-alignment. Default: 0.3.
    --ref_pct_full=FLOAT      Pct of low-quality 0/0 pileup calls rerun in full-alignment. Default: 0.3 (ilmn/hifi), 0.1 (ont).
    --var_pct_phasing=FLOAT   Pct of high-quality 0/1 pileup variants used for WhatsHap phasing. Default: 0.8 (ont guppy5), 0.7 (others).
    --pileup_model_prefix=STR Pileup model prefix. Default: pileup.
    --fa_model_prefix=STR     Full-alignment model prefix. Default: full_alignment.
    --min_mq=INT              Filter reads with MAPQ < $min_mq. Default: 5.
    --min_coverage=INT        Min coverage to call a variant. Default: 2.
    --min_contig_size=INT     Skip contigs smaller than $min_contig_size. Default: 0.
    --fast_mode               Skip candidates with AF <= 0.15.
    --haploid_precise         Haploid: only 1/1 is a variant.
    --haploid_sensitive       Haploid: 0/1 and 1/1 are variants.
    --no_phasing_for_fa       Skip WhatsHap phasing in full-alignment calling.
    --call_snp_only           Skip indels.
    --enable_long_indel       Call indels > 50 bp.
    --keep_iupac_bases        Keep IUPAC bases (default: convert to N).
    --base_err=FLOAT          Estimated base error rate for GVCF. Default: 0.001.
    --gq_bin_size=INT         GQ bin size for non-variant merging in GVCF. Default: 5.
    --enable_variant_calling_at_sequence_head_and_tail
                              Call in the first/last 16 bp of a sequence (amplicon-friendly).
    --output_all_contigs_in_gvcf_header
                              List all contigs in the GVCF header.
    --disable_c_impl          Disable C implementation for tensor creation (default: enable).

Examples

Call variants on selected chromosomes

CONTIGS_LIST="[YOUR_CONTIGS_LIST]"     # e.g "chr21" or "chr21,chr22"

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:v2.0.0 \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR} \
    --ctg_name=${CONTIGS_LIST}

Call variants at known sites

KNOWN_VARIANTS_VCF="[YOUR_VCF_PATH]"   # e.g. /home/user1/known_variants.vcf.gz

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:v2.0.0 \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR} \
    --vcf_fn=${KNOWN_VARIANTS_VCF}

Call variants in BED regions

A BED file is recommended over point coordinates.

# Build a BED (0-based, "ctg start end") if needed
echo -e "${CONTIGS}\t${START_POS}\t${END_POS}" > /home/user1/tmp.bed

BED_FILE_PATH="[YOUR_BED_FILE]"        # e.g. /home/user1/tmp.bed

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:v2.0.0 \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR} \
    --bed_fn=${BED_FILE_PATH}

Call variants in non-diploid organisms (haploid)

docker run -it \
  -v ${INPUT_DIR}:${INPUT_DIR} \
  -v ${OUTPUT_DIR}:${OUTPUT_DIR} \
  hkubal/clair3:v2.0.0 \
  /opt/bin/run_clair3.sh \
    --bam_fn=${INPUT_DIR}/input.bam \
    --ref_fn=${INPUT_DIR}/ref.fa \
    --threads=${THREADS} \
    --platform=ont \
    --model_path=/opt/models/${MODEL_NAME} \
    --output=${OUTPUT_DIR} \
    --no_phasing_for_fa \                      ## disable FA phasing
    --include_all_ctgs \                       ## call on all contigs
    --haploid_precise \                        ## or --haploid_sensitive
    --enable_variant_calling_at_sequence_head_and_tail

Advanced Topics

Dwelling Time Feature

Clair3 v2.0 introduces signal-aware variant calling for Oxford Nanopore data. Dwell time (signal duration per base) extracted from BAM mv tags is used as an additional input channel to the full-alignment model, improving accuracy.

./run_clair3.sh \
  --bam_fn=input.bam \
  --ref_fn=ref.fa \
  --threads=8 \
  --platform=ont \
  --model_path=${MODEL_PATH} \
  --output=${OUTPUT_DIR} \
  --enable_dwell_time

Requirements

BAM must contain mv (move-table) tags from Dorado with --emit-moves.
--platform=ont.
C implementation must be enabled (default; do not pass --disable_c_impl).

See Dwelling Time Feature (full guide incl. training) and the ONT Dwelling Time Quick Demo.

Dealing with amplicon data

Use --enable_variant_calling_at_sequence_head_and_tail.
If coverage is excessively high: set --var_pct_full=1 and --ref_pct_full=1.
- Human: also set --var_pct_phasing=1.
- Non-human: add --no_phasing_for_fa.
Context: discussions #160, #240.

Postprocessing scripts

`SwitchZygosityBasedOnSVCalls`

Given a Clair3 VCF and a Sniffle2 SV VCF, this module re-labels Clair3 SNPs from homozygous to heterozygous when both:

AF ≤ 0.7, and
the ±16 bp flanking region falls inside one or more SV deletions.

Two INFO tags are added: SVBASEDHET and ORG_CLAIR3_SCORE (original QUAL). The new QUAL becomes the top QUAL among overlapping deletions. Inspired by Philipp Rescheneder (ONT).

pypy3 ${CLAIR3_PATH}/clair3.py SwitchZygosityBasedOnSVCalls \
  --bam_fn input.bam \
  --clair3_vcf_input clair3_input.vcf.gz \
  --sv_vcf_input sniffle2.vcf.gz \
  --vcf_output output.vcf \
  --threads 8

Reference

Folder Structure and Submodules

All submodules accept -h / --help.

clair3/ — not pypy-compatible, run with python.

Submodule	Description
`CallVariants`	Call variants from a trained model and candidate tensors.
`CallVarBam`	Call variants from a trained model and a BAM.
`Train`	Train a model with AdamW (PyTorch). DDP via `torchrun`. Initial LR `1e-3` with warm-up. Takes tensor binaries from `Tensor2Bin`.

preprocess/ — pypy-compatible unless noted.

Submodule	Description
`CheckEnvs`	Validate inputs/environment; preprocess BED; `--chunk_size` sets per-job chunk size.
`CreateTensorPileup`	Generate pileup tensors for training/calling.
`CreateTensorFullAlignment`	Generate phased full-alignment tensors.
`GetTruth`	Extract variants from a truth VCF (reference FASTA required if ALT contains `*`).
`MergeVcf`	Merge pileup and full-alignment VCF/GVCF.
`RealignReads`	Local read realignment (Illumina).
`SelectCandidates`	Select pileup candidates for full-alignment calling.
`SelectHetSnp`	Select heterozygous-SNP candidates for WhatsHap phasing.
`SelectQual`	Select a quality cutoff from pileup results; variants below it go to phasing + full-alignment.
`SortVcf`	Sort a VCF file.
`SplitExtendBed`	Split BED by contig; extend by 33 bp for variant calling.
`UnifyRepresentation`	Representation unification between candidates and truth.
`MergeBin`	Merge tensor binaries.
`CreateTrainingTensor`	Create training tensor binaries (pileup or full-alignment).
`Tensor2Bin`	Combine variant/non-variant tensors into a `blosc:lz4hc` binary (not pypy-compatible; ~10–15 GB training memory).

Training Data

Pileup and full-alignment models were trained on four GIAB samples (HG001, HG002, HG004, HG005), excluding HG003. On ONT, a second model trained on HG001–3, 5 excluded HG004. Chr20 was excluded from all training (chr1–19, 21, 22 only).

Platform	Reference	Aligner	Training samples
ONT	GRCh38_no_alt	minimap2	HG001,2,(3\|4),5
PacBio HiFi Sequel II	GRCh38_no_alt	pbmm2	HG001,2,4,5
PacBio HiFi Revio	GRCh38_no_alt	pbmm2	HG002,4
Illumina	GRCh38	BWA-MEM / NovoAlign	HG001,2,4,5

Full details and download links: Training Data.

VCF/GVCF Output Formats

Clair3 uses VCF 4.2. Extra INFO tags distinguish call source:

P — called by the pileup model.
F — called by the full-alignment model.

GVCF output is GATK-compatible and passes GATK ValidateVariants. Clair3 uses <NON_REF> (same as GATK), not DeepVariant's <*>. Merge with GLNexus — a caller-specific config is available for download.

Model Training Guides

Visualization

Citation

Paper	Venue	Topic
Symphonizing pileup and full-alignment for deep learning-based long-read variant calling	Nature Computational Science · bioRxiv preprint	Original Clair3
Accelerated long-read variant calling with Clair3 for whole-genome sequencing	Bioinformatics, 2026	GPU-accelerated Clair3
Leveraging ONT move table values for signal aware variant calling	bioRxiv preprint, 2026	ONT `mv`-tag (move-table) signal-aware tuning

Name		Name	Last commit message	Last commit date
Latest commit History 676 Commits
clair3		clair3
colab		colab
docs		docs
postprocess		postprocess
preprocess		preprocess
scripts		scripts
shared		shared
src		src
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
__init__.py		__init__.py
build.py		build.py
clair3.py		clair3.py
convert_tf_checkpoint_to_torch.py		convert_tf_checkpoint_to_torch.py
run_clair3.py		run_clair3.py
run_clair3.sh		run_clair3.sh

Folders and files

Latest commit

History

Repository files navigation

Clair3

Introduction

Looking for a different variant caller?

Agent Skill

Contents

Latest Updates

v2.0.0 — Feb 9, 2026 (Major release)

v1.2.0 — Aug 1, 2025

v1.1.2 — Jul 10, 2025

v1.1.1 — May 19, 2025

Installation

Option 1. Docker

CPU

GPU (NVIDIA CUDA on Linux)

Option 2. Singularity

CPU

GPU (NVIDIA CUDA on Linux)

Option 3. Bioconda

Option 4. Step-by-step (Conda)

Pre-trained Models

HKU-provided models

ONT-provided models (bundled)

Converted Rerio models

Quick Demo

Usage

General usage

Options

Examples

Call variants on selected chromosomes

Call variants at known sites

Call variants in BED regions

Call variants in non-diploid organisms (haploid)

Advanced Topics

Dwelling Time Feature

Dealing with amplicon data

Postprocessing scripts

SwitchZygosityBasedOnSVCalls

Reference

Folder Structure and Submodules

Training Data

VCF/GVCF Output Formats

Model Training Guides

Visualization

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 31

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`SwitchZygosityBasedOnSVCalls`

Packages