Skip to content

Bigger on the inside

Latest

Choose a tag to compare

@julesjacobsen julesjacobsen released this 28 Feb 00:23

This release is big. You just won't believe how vastly, hugely, mind-bogglingly big it is. I mean, you may think it's
a long way down the road to the chemist's, but that's just peanuts to this.

-- Douglas Adams

Apologies for misquoting Douglas Adams, but this is a big release. For starters, we've fittingly managed to
coincide with Rare Disease Day this year, which is a first.

Rare Disease Day

This release has touched literally every part of the Exomiser CLI and core libraries. These changes should be very apparent as the CLI has subtly changed and improved, the output files have either been replaced or enhanced, and a new parquet format has been added. The documentation has had a fair bit of work done to improve the user installation experience. The analysis scripts and CLI presets have been updated to improve their performance in various scenarios. The biggest change of all is probably that of the logistic regression model which has been updated to take into account the automated ACMG assignments and re-trained using the solved cases from the UK's 100,000 Genomes Project

Given all this change, we urge you to review the changelog also provided below and the documentation and open an issue if you have any questions or problems. Existing pipelines will need to be minorly changed to use this release, but the effort to do so should be worth the gains.

CLI Changes

  • Minimum Java version is now Java 21

  • The CLI is now handled by picocli and has new analyse and batch commands.

    • The analyse command works with the same options as before, but will fail before loading resources if no samples have been provided in the command input.
    • The batch command replaces the --batch option and now has a --dry-run option to check the input commands and samples before running and will write out an error file.

    Run exomiser --help for details or see the docs about how to migrate your scripts. However, the snippet below should be enough to get you started:

    # Running the `analyse` command:
    ## Exomiser < 15.0.0
    java -jar exomiser-cli-14.1.0.jar --analysis examples/exome-analysis.yml --output-directory exomiser-results/exome-analysis --output-format HTML
    # Exomiser 15.0.0
    java -jar exomiser-cli-15.0.0.jar analyse --analysis examples/exome-analysis.yml --output-directory exomiser-results/exome-analysis --output-format HTML
    
    # Running the `batch` command:
    ## Exomiser < 15.0.0
    java -jar exomiser-cli-14.1.0.jar --batch examples/test-analysis-batch-commands.txt
    # Exomiser 15.0.0
    java -jar exomiser-cli-15.0.0.jar batch examples/test-analysis-batch-commands.txt
  • Updated logistic regression model which will take into account the ACMG assignment data which leads to improved accuracy of the results. !!! WARNING - THIS SIGNIFICANTLY CHANGES THE EXOMISER COMBINED SCORES, SO IF YOU USE ANY CUTOFFS TO FILTER YOUR RESULTS IN YOUR PIPELINE, YOU WILL NEED TO RE-CALIBRATE THEM !!!.

  • New alleleBalanceFilter: {} analysis step to filter variants based on allele balance (see docs for details).

  • Updated examples/preset-exome-analysis.yml and examples/preset-genome-analysis.yml to use new defaults. UPDATE YOUR SCRIPTS TO USE THESE FOR IMPROVED ACCURACY.

  • Added examples/preset-exome-analysis-human-only.yml

  • Added examples/preset-exome-analysis-with-introns.yml

  • Added examples/preset-phenotype-only-analysis.yml

  • New PARQUET output file format. This is a much more efficient format for storing results. It is an amalgamation of the TSV_VARIANT and TSV_GENE data with added fields and should be considered as a replacement for the JSON output.

  • JSON output has been replaced with JSONL output which is a line-delimited JSON format (https://jsonlines.org/). Note that the file suffix is now .jsonl rather than .json.

  • New HTML output format. This is a much more compact and readable format for displaying results.

  • Fix for issue #621 in VCF output where ACMG categories were being concatenated with , which broke parsers. These are now replaced with &.

  • Removed use of BS4 category in ACMG assignments as this was being applied too stringently, leading to lost diagnoses in DDD cohort.

  • Fixed PM4 assignment to include disruptive_inframe_deletion/insertion variants

  • Updated Exomiser CLI startup configuration to not write the results directory to the installation directory by default.

Under the hood changes

New Java record classes have been added to the core module to represent the immutable data structures used in the analysis.
These have led to a much less 'getty' API as the traditional Java bean conventions have been replaced with a terser API.

Data Release

This update also includes a new 2512 data release. See the data-release discussions for links.

What's Changed

New Contributors

Full Changelog: 14.1.0...15.0.0