Problems to calculate FDP to the peptide level in a benchmarking study with different DL models in FragPipe.

[DBGear.java](https://github.com/user-attachments/files/27567485/DBGear.java)

Hello,

First of all, thank you very much for developing FDRBench. I have been testing it for entrapment-based FDP estimation in semi-enzymatic proteomics workflows and I would like to ask for your advice regarding a potential issue I encountered.

I modified the FDRBench source code to support a custom Tryp-N/LysN-like digestion strategy for semi-enzymatic searches. I am attaching:

the modified source code (DB.Gear file with Tryp-N),
the protein-level entrapment FASTA used for MSFragger searches,
the commands used for:
protein-level entrapment FASTA generation,
peptide entrapment mapping (.txt) generation,
paired FDP calculation.

The workflow was:

Generate paired entrapment proteins using FDRBench.
Use the generated protein FASTA for MSFragger searches.
Generate the peptide-level entrapment mapping file (.txt).
Run paired FDP estimation at peptide/precursor level.

The commands used were:

Protein-level entrapment FASTA generation:

java -jar fdrbench-0.0.4.jar \
    -level protein \
    -db 2026-05-04-reviewed-contam-UP000005640-spikein.fas \
    -o UP000005640_entrapment_paired.fasta \
    -a \
    -fix_nc c \
    -enzyme 9 \
    -miss_c 2 \
    -minLength 7 \
    -maxLength 50 \
    -check \
    -ns \
    -uniprot \
    -I2L \
    -export_db \
    -decoy

Peptide entrapment mapping generation:

java -jar fdrbench-0.0.4.jar \
    -I2L \
    -level peptide \
    -db UP000005640_entrapment_cleaned.fasta \
    -o UP000005640_entrapment_pep.txt \
    -uniprot \
    -fix_nc c \
    -enzyme 9 \
    -miss_c 2 \
    -minLength 7 \
    -maxLength 50

Paired FDP calculation:

java -jar fdrbench-0.0.4.jar \
    -i FDRBench_exports/WF_C_1A.tsv.remove_invalid_peptides \
    -fold 1 \
    -pep UP000005640_entrapment_pep.txt \
    -level peptide \
    -o WF_C_1A_fdp.csv \
    -score "score:1"

The searches themselves complete correctly and I obtain target, decoy, and entrapment identifications. However, during FDP calculation I repeatedly observe messages such as:

The number of entrapment hits is larger than k: 3 > k=1
Target: RVVVVDC
Entrapment: RVDVVVC

The process finishes without an explicit Java error, but no final FDP output (.csv/.tsv) is generated. Instead, an intermediate file such as:

WF_C_1A.tsv.remove_invalid_peptides

is created, and I am not entirely sure whether this indicates:

an issue in the paired peptide mapping logic,
instability of paired FDP estimation in highly redundant semi-enzymatic peptide spaces,
or a problem caused by my custom enzyme implementation.

My main question is:

Is paired peptide/precursor-level FDP estimation expected to become problematic for semi/non-specific searches with large redundant peptide spaces, even if the entrapment FASTA generation itself is correct?

Thank you very much for your time and for the excellent tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems to calculate FDP to the peptide level in a benchmarking study with different DL models in FragPipe. #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems to calculate FDP to the peptide level in a benchmarking study with different DL models in FragPipe. #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions