Skip to content

eneskemalergin/FilterLabel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FilterLabel

License: Mixed Python R

FilterLabel is an archival snapshot of the filtering workflow used for part of Chapter 2 of the doctoral thesis by Enes Kemal Ergin.

The repository preserves the Python and R scripts used to validate N-terminal biotinylation labels in spectral libraries, along with supporting notebooks and example inputs. It is shared as a citable research artifact for understanding and lightly verifying the thesis workflow, not as a generalized proteomics package.

What This Repository Is

  • A thesis-analysis snapshot centered on N-terminal biotinylation filtering.
  • A record of the Python and R implementations used around that workflow.
  • A small set of example inputs for sanity-checking the scripts.
  • A set of HeLa notebooks with frozen outputs from the thesis analysis.

What This Repository Is Not

  • A packaged software library.
  • A general-purpose framework for arbitrary modification-label processing.
  • A fully self-contained rerun of all thesis notebooks from a fresh clone.

Method Overview

The filtering algorithm validates N-terminal biotin labels by cross-referencing fragment ion annotations with lysine positions and their modification status in each peptide:

  1. Select N-terminally labeled peptides by keeping peptides whose modified sequence begins with a UniMod modification such as (UniMod:3).
  2. Locate lysine residues in the stripped peptide sequence.
  3. Classify lysines as modified or unmodified based on the modification annotations.
  4. Validate fragment evidence:
    • b-ions: lysines before the fragment boundary must be consistent with labeled positions.
    • y-ions: unmodified lysines must remain within the covered y-ion region.
  5. No-K peptides pass immediately once they are N-terminally labeled.
  6. Spectronaut-specific check: measured intensity must be greater than predicted intensity divided by 10.

Precursors passing these checks are retained; the rest are removed from the library.

Current Scope And Assumptions

  • The current workflow is built around N-terminal biotinylation as represented in the thesis data.
  • The implementation currently assumes the label appears as a UniMod annotation at the peptide N-terminus.
  • Supported library sources are Spectronaut and MSFragger exports in tabular form.
  • The scripts are preserved to document the thesis workflow, so portability and generalization were not the primary design goals.

Supported Input Formats

Source Description
Spectronaut Spectronaut report exports (.tsv)
MsFragger MSFragger spectral library files (.tsv)

Repository Structure

FilterLabel/
├── python/               # Python CLI implementation of the filter
│   ├── filter.py
│   └── README.md
├── r/                    # R CLI implementation of the filter
│   ├── filter.R
│   └── README.md
├── src/                  # Helper modules used by the notebooks
│   ├── utils.py
│   └── plots.py
├── example/              # Small example input files for verification
│   ├── MSFragger_input.tsv
│   └── Spectronaut_input.tsv
├── HeLa/                 # Thesis notebooks with frozen outputs
│   ├── DDA.ipynb
│   ├── DIA_directDIA.ipynb
│   ├── DIA_filteredLibrary.ipynb
│   ├── DIA_unfilteredLibrary.ipynb
│   └── Comparison.ipynb
├── LICENSE
├── LICENSE-CODE
├── LICENSE-CONTENT
├── CITATION.cff
└── README.md

Minimal Setup

Python Dependencies

Requires Python 3.8+.

pip install numpy pandas

For the notebook helper modules and notebooks:

pip install matplotlib seaborn biopython jupyter

R Dependencies

Requires R 4.0+.

install.packages(c("dplyr", "readr", "readxl", "writexl", "stringr", "optparse"))

Quickstart

Python

python python/filter.py MSFragger_input.tsv MsFragger example/ --verbose
python python/filter.py Spectronaut_input.tsv Spectronaut example/ --verbose

R

Rscript r/filter.R -f MSFragger_input.tsv -s MsFragger -m example/ -v
Rscript r/filter.R -f Spectronaut_input.tsv -s Spectronaut -m example/ -v

See python/README.md and r/README.md for full argument details.

Reproducibility Notes

  • The example/ directory is the simplest way to sanity-check the scripts.
  • The notebooks in HeLa/ are preserved with frozen outputs as part of the thesis record.
  • The full notebook workflows expect local data/raw and data/processed inputs that are not included in this repository.
  • As a result, the repository is best understood as a documented analysis snapshot with lightweight verification paths, not as a one-command full rerun of the entire thesis chapter.

HeLa Analysis Notebooks

The HeLa/ directory contains notebooks associated with the thesis analysis of biotin-labeled HeLa cell lysates.

Notebook Description
DDA.ipynb Data-dependent acquisition analysis
DIA_directDIA.ipynb DIA analysis using direct-DIA search
DIA_unfilteredLibrary.ipynb DIA analysis using an unfiltered spectral library
DIA_filteredLibrary.ipynb DIA analysis using a FilterLabel-processed spectral library
Comparison.ipynb Cross-comparison of the four acquisition or library settings

These notebooks import helper functions from src/utils.py and src/plots.py and should be interpreted as archival research artifacts unless otherwise noted.

Limitations

  • The workflow is intentionally narrow and reflects the analysis conditions used in the thesis.
  • The current implementation assumes N-terminal UniMod-based labeling conventions.
  • The repository does not currently ship all raw or intermediate notebook data required for full notebook reruns.
  • Python and R implementations are both preserved, but their exact parity should be verified explicitly when using them for future work.

License

This repository uses a split-license model:

  • Source code in python/, r/, and src/ is licensed under the MIT License. See LICENSE-CODE.
  • Original non-code content in this repository, including README prose, issue-draft text, and thesis-oriented notebooks or frozen outputs authored for this repository, is licensed under Creative Commons Attribution-NonCommercial 4.0 International. See LICENSE-CONTENT.
  • Any third-party names, software exports, or upstream materials remain subject to their original terms where applicable.

See LICENSE for the repository-level summary.

Citation

If you use this repository or reuse the archived workflow in your work, please cite it:

@software{ergin2024filterlabel,
  author = {Ergin, Enes K.},
  title = {{FilterLabel: Validation of N-Terminal Biotinylation Labels in Spectral Libraries}},
  year = {2024},
  url = {https://github.com/eneskemalergin/FilterLabel}
}

References


Biotin bound tight, —
Fragment ions tell the truth,
Only labeled stay.

About

Chemical labeling-based filtering for selecting confidently N-terminal biotinylated precursors in Spectronaut and MSFragger spectral libraries.

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE-CODE
Unknown
LICENSE-CONTENT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors