Contact: Ruibang Luo, Zhenxian Zheng, Xian Yu
Email: rbluo@cs.hku.hk · zxzheng@cs.hku.hk · yuxian@connect.hku.hk
Clair3 is a germline small-variant caller for long-read sequencing. It combines two complementary models to balance speed and accuracy:
- Pileup calling — fast, handles the majority of variant candidates from summarized alignment statistics.
- Full-alignment calling — computationally intensive, resolves uncertain candidates from haplotype-resolved full alignments.
Clair3 is the 3rd generation of Clair (2nd) and Clairvoyante (1st).
| Use case | Tool |
|---|---|
| Germline on long-read RNA-seq | Clair3-RNA |
| Somatic, paired tumor/normal | ClairS |
| Somatic, tumor-only | ClairS-TO |
Clair-skills is a plug-in for agentic AI coding assistants (Claude Code, Cursor, Codex, …) that covers the entire Clair suite. It helps the agent pick the right tool and model, generate ready-to-run commands, and analyze results.
- Latest Updates
- Installation — Docker · Singularity · Bioconda · Step-by-step (Conda)
- Pre-trained Models
- Quick Demo
- Usage
- Advanced Topics — Dwelling time · Amplicon data · Postprocessing
- Reference — Folder structure · Training data · VCF/GVCF formats · Model training guides
- Citation
A preprint describing the performance of Clair3 v2 is available on bioRxiv.
- PyTorch migration. The deep-learning backend moved from TensorFlow to PyTorch. v1 TensorFlow models are not compatible with v2 (including the TF models ONT provides via Rerio). Use the Converted Rerio Clair3 Models (PyTorch), or convert your own with the Model Migration Guide. Pre-trained PyTorch models: download here.
- Signal-aware variant calling for ONT. Pass
--enable_dwell_timeon BAMs with Doradomvtags (requires--emit-moves). See Dwelling Time Feature. - New Python runner.
run_clair3.shwas reconstructed asrun_clair3.py; both remain usable. - Checkpoint format. TF
.index/.data→ PyTorch.pt.
Native GPU support on Linux and Apple Silicon. Clair3 on GPU runs ~5× faster than CPU. See the GPU Quick Start.
- Boundary check for an insertion immediately followed by soft-clipping (#394, @dpryan79).
- Parallel-job exit-code checking; pipeline now exits immediately on any job failure (#392, @SamStudio8).
- Fixed the malformed VCF header on AWS (#380).
- Added an R10.4.1 model fine-tuned on 12 bacterial genomes (notes, @wshropshire).
Earlier versions (click to expand)
v1.1.0 — Apr 8, 2025. Removed parallel version checking (#377).
v1.0.11 — Mar 19, 2025. Added --enable_variant_calling_at_sequence_head_and_tail to call variants in the first/last 16 bp of a sequence (use with caution — less reliable alignments and less context; #257). Added --output_all_contigs_in_gvcf_header (#371). Added postprocessing AddPairEndAlleleDepth (PEAD tag, Bin Guan, NEI). Fixed AF format in GVCF output (#365). Added a split-into-haplotypes calling workflow. set -o pipefail in run_clair3.sh (#368). Clarified parameter docs (#369).
v1.0.10 — Jul 28, 2024. Fixed an out-of-range bug in non-human GVCF output (#317). Faster amplicon calling via --chunk_num=-1 (#306). LongPhase bumped to 1.7.3 (#321).
v1.0.9 — May 15, 2024. Fixed VCF header (#305); updated DP FORMAT description.
v1.0.8 — Apr 29, 2024. Fixed occasional quality-score differences between VCF and GVCF output. LongPhase bumped to 1.7.
v1.0.7 — Apr 7, 2024. Memory guards for full-alignment C implementation (#286). Raised max mpileup coverage to 2^20 (#292). LongPhase bumped to 1.6.
v1.0.6 — Mar 15, 2024. Stack-overflow fix at very high coverage (#282). Reference caching for CRAM (#278). Fixed RefCall outputs when FA model calls no variant (#271). Fixed min-coverage filtering (#262). --min_snp_af / --min_indel_af default to 0.0 when --vcf_fn is set (#261).
v1.0.5 — Dec 20, 2023. Fixed multi-allelic AF at very high coverage (#241). --base_err and --gq_bin_size to reduce excess ./. in GVCF (#220).
v1.0.4 — Jul 11, 2023. Command line and reference source now in VCF header. Fixed AF for 1/2 genotypes. Added AD tag.
v1.0.3 — Jun 20, 2023. Colon : allowed in reference sequence names (#203).
v1.0.2 — May 22, 2023. Added PacBio HiFi Revio model. Fixed halt on too few variant candidates (#198).
v1.0.1 — Apr 24, 2023. WhatsHap bumped to 1.7 (~15% faster haplotagging, #193). Fixed PL when ALT is N (#191).
v1.0.0 — Mar 6, 2023. Clair3 version in VCF header (#141). NumPy int fix (#165). IUPAC → N by default, keep with --keep_iupac_bases (#153). Added --use_{longphase,whatshap}_for_intermediate_phasing / --use_{longphase,whatshap}_for_final_output_phasing / --use_whatshap_for_final_output_haplotagging (#164). Fixed Docker shell under host user mode (#175).
v0.1-r12 — Aug 19, 2022. CRAM input (#117). Python 3.9, TensorFlow 2.8, Samtools 1.15.1, WhatsHap 1.4. DP now shows raw coverage for pileup calls (#128). Illumina representation-unification fix (#110). LongPhase 1.3.
v0.1-r11 minor 2 — Apr 16, 2022. Fixed missing non-variant GVCF positions at chunk boundaries. Reduced GVCF memory footprint (#88).
v0.1-r11 — Apr 4, 2022. ~2.5× faster on ONT Q20 data with pileup and full-alignment feature generation in C. LongPhase as a phasing option (--longphase_for_phasing). --min_coverage, --min_mq, --min_contig_size. CSI index support (#90). See Notes on r11.
v0.1-r10 — Jan 13, 2022. Added the Guppy5 model r941_prom_sup_g5014 (benchmarks); applicable to sup, hac, fast reads. The older r941_prom_sup_g506 was obsoleted. Added --var_pct_phasing.
v0.1-r9 — Dec 1, 2021. --enable_long_indel for indel calls >50 bp (benchmarks, #64).
v0.1-r8 — Nov 11, 2021. --enable_phasing to emit WhatsHap-phased VCF (#63). Fixed unexpected program termination on success.
v0.1-r7 — Oct 18, 2021. ONT var_pct_full raised 0.3 → 0.7 (+~0.2% indel F1). Fall-through to next-likely variant on low coverage (#53). Streamlined training. mini_epochs in Train.py (#60). GVCF intermediates now lz4-compressed (5× smaller). --remove_intermediate_dir (#48). ONT models renamed per Medaka convention. Training-data leakage fixed (#57).
ONT-provided models — Sep 23, 2021. ONT also provides chemistry-/basecaller-specific Clair3 models via Rerio.
v0.1-r6 — Sep 4, 2021. Reduced SortVcf memory (#45). Lower ulimit -n requirement (#47). Clair3-Illumina in bioconda (#42).
v0.1-r5 — Jul 19, 2021. Training-data generator fix to avoid Tensorflow segfaults. Simplified Dockerfile. Fixed ALT output for reference calls. Fixed multi-allelic AF ([ACGT]Del). AD tag in GVCF. --call_snp_only (#40). Pileup/FA validity checks (#32, #38).
v0.1-r4 — Jun 28, 2021. Bioconda install. ONT Guppy2 model (benchmarks — must be used on Guppy2-or-earlier data). Colab notebooks. Fix on too few variant candidates (#28).
v0.1-r3 — Jun 9, 2021. ulimit -u check with auto-retry on failed jobs (#20, #23, #24). ONT Guppy5 model (benchmarks).
v0.1-r2 — May 23, 2021. BED out-of-range fix (#12). Both .bam.bai and .bai accepted (#10). Boundary and package version checks.
v0.1-r1 — May 18, 2021. Relative paths in Conda (#5). taskset CPU-core visibility fix and Singularity image (#6).
v0.1 — May 17, 2021. Initial release.
Pick the right install method for your hardware:
- CPU → Docker (Option 1), Singularity (Option 2), or Bioconda (Option 3).
- NVIDIA GPU (Linux) → Docker GPU (Option 1) or Singularity GPU (Option 2); fall back to Step-by-step (Option 4) if unsupported.
- Apple Silicon (M1/M2/M3/M4) → Step-by-step (Option 4).
See the GPU Quick Start for tuned settings.
Pre-built image: hkubal/clair3.
Use absolute paths for
INPUT_DIRandOUTPUT_DIR.
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. /home/user1/input (absolute path)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]" # e.g. /home/user1/output (absolute path)
THREADS="[MAXIMUM_THREADS]" # e.g. 8
MODEL_NAME="[YOUR_MODEL_NAME]" # e.g. r1041_e82_400bps_sup_v500
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clair3:v2.0.0 \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \ ## {ont,hifi,ilmn}
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR}
python3 /opt/bin/run_clair3.pycan replace/opt/bin/run_clair3.shin the command above.
Image: hkubal/clair3:v2.0.0_gpu (built on CUDA 12.1).
Requirements
- NVIDIA driver ≥ 530.30.02.
- NVIDIA Container Toolkit installed on the host.
docker run -it --gpus all \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clair3:v2.0.0_gpu \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \ ## {ont,hifi,ilmn}
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR} \
--use_gpuNotes
- Select specific GPUs with
--gpus '"device=0,1"'(Docker) and--device=cuda:0,1(Clair3). - If the image does not work on your setup (unsupported driver/CUDA, no NVIDIA Container Toolkit, Apple Silicon, etc.), fall back to Step-by-step (Option 4).
Use absolute paths for
INPUT_DIRandOUTPUT_DIR.
conda config --add channels defaults
conda create -n singularity-env -c conda-forge singularity -y
conda activate singularity-env
singularity pull docker://hkubal/clair3:v2.0.0
singularity exec \
-B ${INPUT_DIR},${OUTPUT_DIR} \
clair3_v2.0.0.sif \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \ ## {ont,hifi,ilmn}
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR}Requirements
- NVIDIA driver ≥ 530.30.02.
- Singularity (or Apptainer) with
--nvsupport.
singularity pull docker://hkubal/clair3:v2.0.0_gpu
singularity exec --nv --cleanenv --env TMPDIR=/tmp \
-B ${INPUT_DIR},${OUTPUT_DIR} \
clair3_v2.0.0_gpu.sif \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \ ## {ont,hifi,ilmn}
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR} \
--use_gpuNotes
--nvinjects the host NVIDIA driver and libraries into the container (equivalent of Docker's--gpus all); no NVIDIA Container Toolkit needed.--cleanenv --env TMPDIR=/tmpavoidsparallelfailing when the hostTMPDIRpoints to a path not visible inside the container.- If the image does not work on your setup, fall back to Step-by-step (Option 4).
Clair3 is available on Bioconda. The recipe bundles PyPy, samtools, parallel, whatshap, LongPhase, and the pre-trained models under ${CONDA_PREFIX}/bin/models/. See bioconda-recipes#64260 for the v2 (PyTorch) recipe.
mamba create -n clair3 -c conda-forge -c bioconda -y clair3
mamba activate clair3
MODEL_NAME="[YOUR_MODEL_NAME]" # e.g. r1041_e82_400bps_sup_v500
run_clair3.sh \
--bam_fn=input.bam \
--ref_fn=ref.fa \
--threads=${THREADS} \
--platform=ont \ ## {ont,hifi,ilmn}
--model_path=${CONDA_PREFIX}/bin/models/${MODEL_NAME} \
--output=${OUTPUT_DIR}Note. The Bioconda package ships a CPU-only PyTorch build. For NVIDIA GPU or Apple Silicon, use Step-by-step (Option 4).
Install Mamba or Conda from miniforge (Mamba is much faster).
Step 1 — Create and activate the environment
mamba create -n clair3_v2 -c conda-forge -c bioconda -y \
python=3.11 samtools whatshap parallel \
zstd xz zlib bzip2 automake make gcc gxx curl pigz
mamba activate clair3_v2
pip install uvStep 2 — Install PyTorch
Pick the right build for your system from the PyTorch website.
# Example: NVIDIA CUDA 13.0
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu130
# Or: CPU only
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpuStep 3 — Clone Clair3
cd ${HOME}
git clone https://github.com/HKU-BAL/Clair3.git
cd Clair3
export CLAIR3_PATH=$(pwd)Step 4 — Install Python deps and build C sources
uv pip install numpy h5py hdf5plugin numexpr tqdm cffi torchmetrics
make PREFIX=${CONDA_PREFIX}
makecompiles samtools/htslib, LongPhase, and the Clair3 C shared library (libclair3.so) used for fast pileup and full-alignment tensor generation.
Step 5 — Install PyPy3.11 (speeds up preprocessing)
wget https://downloads.python.org/pypy/pypy3.11-v7.3.20-linux64.tar.bz2
tar -xjf pypy3.11-v7.3.20-linux64.tar.bz2 && rm pypy3.11-v7.3.20-linux64.tar.bz2
ln -s $(pwd)/pypy3.11-v7.3.20-linux64/bin/pypy3 ${CONDA_PREFIX}/bin/pypy3
ln -s $(pwd)/pypy3.11-v7.3.20-linux64/bin/pypy3 ${CONDA_PREFIX}/bin/pypy
pypy3 -m ensurepip
pypy3 -m pip install mpmath==1.2.1Step 6 — (Optional) Download pre-trained models
cd ${CLAIR3_PATH}
mkdir -p models
wget -r -np -nH --cut-dirs=2 -R "index.html*" -P ./models \
https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/Individual models can also be grabbed from the model index.
Step 7 — Run Clair3
MODEL_NAME=r1041_e82_400bps_sup_v500
${CLAIR3_PATH}/run_clair3.sh \
--bam_fn=input.bam \
--ref_fn=ref.fa \
--threads=${THREADS} \
--platform=ont \
--model_path=${CLAIR3_PATH}/models/${MODEL_NAME} \
--output=${OUTPUT_DIR}
python3 ${CLAIR3_PATH}/run_clair3.pyaccepts the same arguments and can be used interchangeably.
Important: v1 TensorFlow models are not compatible with Clair3 v2 (including the TF models ONT provides via Rerio). Convert your own with the Model Migration Guide, or use the pre-converted models below.
Download:
- HKU-provided: https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/
- Converted ONT Rerio: https://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/
Bundled locations: /opt/models/ (Docker) · ${CONDA_PREFIX}/bin/models/ (Bioconda).
Listed at https://www.bio8.cs.hku.hk/clair3/clair3_models_pytorch/.
| Model | Platform | --platform |
Training samples / Notes | Bioconda | Docker |
|---|---|---|---|---|---|
r1041_e82_400bps_hac_v520_with_mv (latest) |
ONT R10.4.1 E8.2 (5 kHz), HAC | ont |
HG001,2,5 (chr20 excluded) — signal-aware, use --enable_dwell_time |
✓ | |
r1041_e82_400bps_sup_v520_with_mv (latest) |
ONT R10.4.1 E8.2 (5 kHz), SUP | ont |
HG001,2,5 (chr20 excluded) — signal-aware, use --enable_dwell_time |
✓ | |
r1041_e82_400bps_sup_v430_bacteria_finetuned |
ONT R10.4.1 | ont |
Fine-tuned on 12 bacterial genomes | ✓ | |
r941_prom_sup_g5014 |
ONT R9.4.1, Guppy5 SUP | ont |
HG002,4,5; also usable on HAC reads (benchmarks) | ✓ | ✓ |
r941_prom_hac_g360+g422 |
ONT R9.4.1, Guppy3/4 HAC | ont |
HG001,2,4,5 | ||
hifi_revio |
PacBio HiFi Revio | hifi |
HG002,4 | ✓ | ✓ |
hifi_sequel2 |
PacBio HiFi Sequel II | hifi |
HG001,2,4,5 | ✓ | ✓ |
ilmn |
Illumina | ilmn |
HG001,2,4,5 | ✓ | ✓ |
Recommendation for modern ONT R10.4.1 data: when your BAM has Dorado
mvtags, use the dwell-time model (..._with_mv) for the best accuracy; otherwise, use an ONT-trained model below.
ONT's models are fine-tuned to specific chemistries / basecallers and typically outperform the HKU baselines — we recommend using them for best results. Official PyTorch distributions from ONT are in progress; in the meantime, use the converted Rerio models below.
The following ONT-trained models are bundled with Clair3 Docker / Bioconda since v1.1.1:
| Model | Chemistry | Dorado model | Bioconda | Docker |
|---|---|---|---|---|
r1041_e82_400bps_sup_v500 |
R10.4.1 E8.2 (5 kHz) | v5.0.0 SUP | ✓ | ✓ |
r1041_e82_400bps_hac_v500 |
R10.4.1 E8.2 (5 kHz) | v5.0.0 HAC | ✓ | |
r1041_e82_400bps_sup_v410 |
R10.4.1 E8.2 (4 kHz) | v4.1.0 SUP | ✓ | ✓ |
r1041_e82_400bps_hac_v410 |
R10.4.1 E8.2 (4 kHz) | v4.1.0 HAC | ✓ |
ONT has released newer Dorado v5.2.0 models (
r1041_e82_400bps_sup_v520/hac_v520). They are not yet bundled in Docker / Bioconda — download them from the Converted Rerio models section below.
The full ONT Rerio catalog converted to PyTorch for Clair3 v2 is available at https://www.bio8.cs.hku.hk/clair3/clair3_models_rerio_pytorch/. A selection of recent R10.4.1 E8.2 (5 kHz) models is listed below.
| Model | Chemistry | Dorado model |
|---|---|---|
r1041_e82_400bps_sup_v520 (latest) |
R10.4.1 E8.2 (5 kHz) | v5.2.0 SUP |
r1041_e82_400bps_hac_v520 (latest) |
R10.4.1 E8.2 (5 kHz) | v5.2.0 HAC |
r1041_e82_400bps_sup_v500 |
R10.4.1 E8.2 (5 kHz) | v5.0.0 SUP |
r1041_e82_400bps_hac_v500 |
R10.4.1 E8.2 (5 kHz) | v5.0.0 HAC |
r1041_e82_400bps_sup_v430 |
R10.4.1 E8.2 (5 kHz) | v4.3.0 SUP |
r1041_e82_400bps_hac_v430 |
R10.4.1 E8.2 (5 kHz) | v4.3.0 HAC |
r1041_e82_400bps_sup_v410 |
R10.4.1 E8.2 (5 kHz) | v4.1.0 SUP |
r1041_e82_400bps_hac_v410 |
R10.4.1 E8.2 (5 kHz) | v4.1.0 HAC |
For other chemistries and basecaller versions (R10.4.1 E8.2 260 bps, R10.4 E8.1, earlier Guppy g6xx / g5015, v4.0.0 / v4.2.0), browse the full model directory and pick the one matching your chemistry and basecaller (Dorado / Guppy) version.
- ONT with dwelling time — ONT Dwelling Time Quick Demo
- Oxford Nanopore (ONT) — ONT Quick Demo
- PacBio HiFi — PacBio HiFi Quick Demo
- Illumina NGS — Illumina Quick Demo
Caution: Use
=valuefor all parameters, e.g.--bed_fn=fn.bed(not--bed_fn fn.bed).
./run_clair3.sh \
--bam_fn=${BAM} \
--ref_fn=${REF} \
--threads=${THREADS} \
--platform=ont \ ## {ont,hifi,ilmn}
--model_path=${MODEL_PREFIX} \
--output=${OUTPUT_DIR} \
--include_all_ctgs ## required for non-human speciesOutputs:
| File | Description |
|---|---|
${OUTPUT_DIR}/pileup.vcf.gz |
Pileup model calls |
${OUTPUT_DIR}/full_alignment.vcf.gz |
Full-alignment model calls |
${OUTPUT_DIR}/merge_output.vcf.gz |
Final Clair3 output |
By default, variants are called on chr{1..22,X,Y} and {1..22,X,Y}. Override with --include_all_ctgs, --ctg_name, or --bed_fn.
python3 run_clair3.pyis interchangeable with./run_clair3.sh.
Required
-b, --bam_fn=FILE Indexed BAM input.
-f, --ref_fn=FILE Indexed FASTA reference.
-m, --model_path=STR Folder containing pileup.pt and full_alignment.pt.
-t, --threads=INT Max threads. Each chunk uses 4; ceil(threads/4)*3 chunks run in parallel.
-p, --platform=STR {ont,hifi,ilmn}
-o, --output=PATH VCF/GVCF output directory.
Common options
--bed_fn=FILE Call variants only in these BED regions.
--vcf_fn=FILE Candidate sites VCF; only call at these sites.
--ctg_name=STR Sequence(s) to process.
--sample_name=STR Sample name in the output VCF.
--qual=INT Variants with QUAL > $qual are PASS, else LowQual.
--chunk_size=INT Chunk size for parallel processing. Default: 5000000.
--pileup_only Pileup model only. Default: disable.
--print_ref_calls Include 0/0 calls in the VCF. Default: disable.
--include_all_ctgs Call on all contigs. Default: chr{1..22,X,Y}.
--gvcf Emit GVCF. Default: disable.
--remove_intermediate_dir Drop intermediate files when no longer needed.
GPU / signal-aware
--use_gpu Enable GPU-accelerated calling.
--device=STR GPU device(s), e.g. 'cuda:0' or 'cuda:0,1'. Default: all visible GPUs.
--enable_dwell_time Signal-aware calling via Dorado mv tags (ONT only; C impl required).
Phasing
--use_whatshap_for_intermediate_phasing Default: enable.
--use_longphase_for_intermediate_phasing Default: disable.
--use_whatshap_for_final_output_phasing Default: disable.
--use_longphase_for_final_output_phasing Default: disable.
--use_whatshap_for_final_output_haplotagging Default: disable.
--enable_phasing Alias of --use_whatshap_for_final_output_phasing (legacy).
--longphase_for_phasing Alias of --use_longphase_for_intermediate_phasing (legacy).
External binaries
--samtools=STR samtools >= 1.10
--python=STR python3 >= 3.6
--pypy=STR pypy3 >= 3.6
--parallel=STR parallel >= 20191122
--whatshap=STR whatshap >= 1.0
--longphase=STR longphase >= 1.0
Experimental / advanced
--snp_min_af=FLOAT Min SNP AF. Default: ont/hifi/ilmn = 0.08.
--indel_min_af=FLOAT Min indel AF. Default: ont=0.15, hifi/ilmn=0.08.
--var_pct_full=FLOAT Pct of low-quality 0/1 and 1/1 pileup calls rerun in full-alignment. Default: 0.3.
--ref_pct_full=FLOAT Pct of low-quality 0/0 pileup calls rerun in full-alignment. Default: 0.3 (ilmn/hifi), 0.1 (ont).
--var_pct_phasing=FLOAT Pct of high-quality 0/1 pileup variants used for WhatsHap phasing. Default: 0.8 (ont guppy5), 0.7 (others).
--pileup_model_prefix=STR Pileup model prefix. Default: pileup.
--fa_model_prefix=STR Full-alignment model prefix. Default: full_alignment.
--min_mq=INT Filter reads with MAPQ < $min_mq. Default: 5.
--min_coverage=INT Min coverage to call a variant. Default: 2.
--min_contig_size=INT Skip contigs smaller than $min_contig_size. Default: 0.
--fast_mode Skip candidates with AF <= 0.15.
--haploid_precise Haploid: only 1/1 is a variant.
--haploid_sensitive Haploid: 0/1 and 1/1 are variants.
--no_phasing_for_fa Skip WhatsHap phasing in full-alignment calling.
--call_snp_only Skip indels.
--enable_long_indel Call indels > 50 bp.
--keep_iupac_bases Keep IUPAC bases (default: convert to N).
--base_err=FLOAT Estimated base error rate for GVCF. Default: 0.001.
--gq_bin_size=INT GQ bin size for non-variant merging in GVCF. Default: 5.
--enable_variant_calling_at_sequence_head_and_tail
Call in the first/last 16 bp of a sequence (amplicon-friendly).
--output_all_contigs_in_gvcf_header
List all contigs in the GVCF header.
--disable_c_impl Disable C implementation for tensor creation (default: enable).
CONTIGS_LIST="[YOUR_CONTIGS_LIST]" # e.g "chr21" or "chr21,chr22"
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clair3:v2.0.0 \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR} \
--ctg_name=${CONTIGS_LIST}KNOWN_VARIANTS_VCF="[YOUR_VCF_PATH]" # e.g. /home/user1/known_variants.vcf.gz
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clair3:v2.0.0 \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR} \
--vcf_fn=${KNOWN_VARIANTS_VCF}A BED file is recommended over point coordinates.
# Build a BED (0-based, "ctg start end") if needed
echo -e "${CONTIGS}\t${START_POS}\t${END_POS}" > /home/user1/tmp.bed
BED_FILE_PATH="[YOUR_BED_FILE]" # e.g. /home/user1/tmp.bed
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clair3:v2.0.0 \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR} \
--bed_fn=${BED_FILE_PATH}docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
-v ${OUTPUT_DIR}:${OUTPUT_DIR} \
hkubal/clair3:v2.0.0 \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \
--ref_fn=${INPUT_DIR}/ref.fa \
--threads=${THREADS} \
--platform=ont \
--model_path=/opt/models/${MODEL_NAME} \
--output=${OUTPUT_DIR} \
--no_phasing_for_fa \ ## disable FA phasing
--include_all_ctgs \ ## call on all contigs
--haploid_precise \ ## or --haploid_sensitive
--enable_variant_calling_at_sequence_head_and_tailClair3 v2.0 introduces signal-aware variant calling for Oxford Nanopore data. Dwell time (signal duration per base) extracted from BAM mv tags is used as an additional input channel to the full-alignment model, improving accuracy.
./run_clair3.sh \
--bam_fn=input.bam \
--ref_fn=ref.fa \
--threads=8 \
--platform=ont \
--model_path=${MODEL_PATH} \
--output=${OUTPUT_DIR} \
--enable_dwell_timeRequirements
- BAM must contain
mv(move-table) tags from Dorado with--emit-moves. --platform=ont.- C implementation must be enabled (default; do not pass
--disable_c_impl).
See Dwelling Time Feature (full guide incl. training) and the ONT Dwelling Time Quick Demo.
- Use
--enable_variant_calling_at_sequence_head_and_tail. - If coverage is excessively high: set
--var_pct_full=1and--ref_pct_full=1.- Human: also set
--var_pct_phasing=1. - Non-human: add
--no_phasing_for_fa.
- Human: also set
- Context: discussions #160, #240.
Given a Clair3 VCF and a Sniffle2 SV VCF, this module re-labels Clair3 SNPs from homozygous to heterozygous when both:
- AF ≤ 0.7, and
- the ±16 bp flanking region falls inside one or more SV deletions.
Two INFO tags are added: SVBASEDHET and ORG_CLAIR3_SCORE (original QUAL). The new QUAL becomes the top QUAL among overlapping deletions. Inspired by Philipp Rescheneder (ONT).
pypy3 ${CLAIR3_PATH}/clair3.py SwitchZygosityBasedOnSVCalls \
--bam_fn input.bam \
--clair3_vcf_input clair3_input.vcf.gz \
--sv_vcf_input sniffle2.vcf.gz \
--vcf_output output.vcf \
--threads 8All submodules accept
-h/--help.
clair3/ — not pypy-compatible, run with python.
| Submodule | Description |
|---|---|
CallVariants |
Call variants from a trained model and candidate tensors. |
CallVarBam |
Call variants from a trained model and a BAM. |
Train |
Train a model with AdamW (PyTorch). DDP via torchrun. Initial LR 1e-3 with warm-up. Takes tensor binaries from Tensor2Bin. |
preprocess/ — pypy-compatible unless noted.
| Submodule | Description |
|---|---|
CheckEnvs |
Validate inputs/environment; preprocess BED; --chunk_size sets per-job chunk size. |
CreateTensorPileup |
Generate pileup tensors for training/calling. |
CreateTensorFullAlignment |
Generate phased full-alignment tensors. |
GetTruth |
Extract variants from a truth VCF (reference FASTA required if ALT contains *). |
MergeVcf |
Merge pileup and full-alignment VCF/GVCF. |
RealignReads |
Local read realignment (Illumina). |
SelectCandidates |
Select pileup candidates for full-alignment calling. |
SelectHetSnp |
Select heterozygous-SNP candidates for WhatsHap phasing. |
SelectQual |
Select a quality cutoff from pileup results; variants below it go to phasing + full-alignment. |
SortVcf |
Sort a VCF file. |
SplitExtendBed |
Split BED by contig; extend by 33 bp for variant calling. |
UnifyRepresentation |
Representation unification between candidates and truth. |
MergeBin |
Merge tensor binaries. |
CreateTrainingTensor |
Create training tensor binaries (pileup or full-alignment). |
Tensor2Bin |
Combine variant/non-variant tensors into a blosc:lz4hc binary (not pypy-compatible; ~10–15 GB training memory). |
Pileup and full-alignment models were trained on four GIAB samples (HG001, HG002, HG004, HG005), excluding HG003. On ONT, a second model trained on HG001–3, 5 excluded HG004. Chr20 was excluded from all training (chr1–19, 21, 22 only).
| Platform | Reference | Aligner | Training samples |
|---|---|---|---|
| ONT | GRCh38_no_alt | minimap2 | HG001,2,(3|4),5 |
| PacBio HiFi Sequel II | GRCh38_no_alt | pbmm2 | HG001,2,4,5 |
| PacBio HiFi Revio | GRCh38_no_alt | pbmm2 | HG002,4 |
| Illumina | GRCh38 | BWA-MEM / NovoAlign | HG001,2,4,5 |
Full details and download links: Training Data.
Clair3 uses VCF 4.2. Extra INFO tags distinguish call source:
P— called by the pileup model.F— called by the full-alignment model.
GVCF output is GATK-compatible and passes GATK ValidateVariants. Clair3 uses <NON_REF> (same as GATK), not DeepVariant's <*>. Merge with GLNexus — a caller-specific config is available for download.
- Pileup model training
- Full-alignment model training
- Representation unification
- Model migration (TensorFlow → PyTorch)
| Paper | Venue | Topic |
|---|---|---|
| Symphonizing pileup and full-alignment for deep learning-based long-read variant calling | Nature Computational Science · bioRxiv preprint | Original Clair3 |
| Accelerated long-read variant calling with Clair3 for whole-genome sequencing | Bioinformatics, 2026 | GPU-accelerated Clair3 |
| Leveraging ONT move table values for signal aware variant calling | bioRxiv preprint, 2026 | ONT mv-tag (move-table) signal-aware tuning |
