Description of the bug
There is a small bug in the DADA2_POOLED_INFER step where sample names are inferred from their sequence data, in this line:
|
names(filts) <- gsub("(_${trimmode})?.trim.fastq.gz", "", filts) |
This was designed to deal with paired end reads which have sequences file names like <SAMPLE>_1.trimmed.fastq.gz. However, this causes issues in some cases where sample names have an ending that unintentionally matches this regexp (e.g. _1 or _2), particularly in single-end runs, particularly w/ PacBio data. This doesn't cause deleterious effects on downstream data apart from the sample names, but can cause unintended name collisions in rare cases (like sample1 and sample1_1), which will error out downstream when read tracking or other steps are performed.
Command used and terminal output
Relevant files
No response
System information
No response
Description of the bug
There is a small bug in the
DADA2_POOLED_INFERstep where sample names are inferred from their sequence data, in this line:TADA/modules/local/dadainfer.nf
Line 54 in 5b5fe27
This was designed to deal with paired end reads which have sequences file names like
<SAMPLE>_1.trimmed.fastq.gz. However, this causes issues in some cases where sample names have an ending that unintentionally matches this regexp (e.g._1or_2), particularly in single-end runs, particularly w/ PacBio data. This doesn't cause deleterious effects on downstream data apart from the sample names, but can cause unintended name collisions in rare cases (likesample1andsample1_1), which will error out downstream when read tracking or other steps are performed.Command used and terminal output
Relevant files
No response
System information
No response