Skip to content

Small bug with sample renaming #74

@cjfields

Description

@cjfields

Description of the bug

There is a small bug in the DADA2_POOLED_INFER step where sample names are inferred from their sequence data, in this line:

names(filts) <- gsub("(_${trimmode})?.trim.fastq.gz", "", filts)

This was designed to deal with paired end reads which have sequences file names like <SAMPLE>_1.trimmed.fastq.gz. However, this causes issues in some cases where sample names have an ending that unintentionally matches this regexp (e.g. _1 or _2), particularly in single-end runs, particularly w/ PacBio data. This doesn't cause deleterious effects on downstream data apart from the sample names, but can cause unintended name collisions in rare cases (like sample1 and sample1_1), which will error out downstream when read tracking or other steps are performed.

Command used and terminal output

Relevant files

No response

System information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions