Skip to content

ultraplex does not remove barcode from the other read (when insert is short); ultraplex destroy read ID #39

@algaebrown

Description

@algaebrown

Hi! Thanks for making this amazing tool

I noticed

  1. ultraplex does not remove barcode from the other read (when insert is short)
  2. ultraplex removed the space between read id and read name, so when I want to further remove the barcode at the other read, cutadapt cannot validate the integrity between two reads (read id is concatenated to name, causing problems)

In particular:

problem 1:
for a library with barcode ATGCGCAG, the output of the reverse read still contain the reverse complement of the barcode at the end (CTGCGCAT is reverse complement to ATGCGCAG)

zcat ultraple_demux_somthing_Rev.fastq.gz | grep -v '@' | grep CTGCGCAT

GACGTCGTGCTCTCCCCCTGCGCAT
CCCCCCGCGGGGGCGCGCCGGTTCTGCGCAT
CTCCCGGGGCTACGCCTGTCTGAGCGTCGCTATCTGCGCAT
GAAAGTCGGAGACCTGCGCAT
TCCCGGGGCTACGCCTGTCTGAGCGTCGCTTTACTGCGCAT
GGCGGCGTCCGGTGAGCTCTCGCTGGCCTTCTGCGCAT
GGCTACGCCTGTCTGAGCGTCGCTTGTCTGCGCAT
GTCCTGGGAAACGGGGCGCGGCCGGCCCTGCGCAT
TGGTGACCACGGGTGACGGGGAAGCTGCGCAT
GACCCGCCGGGCAGCTTCCGGGAAACCAAAATCTGCGCATA
GGTTCGATTCCGGTTGCGTCCACCCACTGCGCAT

these reads will have problems to map

problem 2:
output read ID is concatenated to read name, causing problem with cutadapt:

# cutadapt error
ERROR: Error in sequence file at unknown line: Reads are improperly paired. Read name 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' in file 1 does not match 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:' in file 2.

the problem is that it is checking if A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT==A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT but because the space between is gone, it now determines 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT1:N:0:ATTACTCG+AGGATAGGrbc:' != 'A00475:502:HJLHHDRX2:1:2102:24542:2284:TGTACATAAT2:N:0:ATTACTCG+AGGATAGGrbc:'

output from ultraplex:

(base) [hsher@tscc-login1 fastqs]$ zcat ultraplex_demux_PUM2_Fwd.fastq.gz | head
@A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG2:N:0:ATTACTCG+AGGCTATArbc:
AGTTGGGGAAATCGCAGGGGTCAGCACATCCGGAGTGCAATG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF

notice the space between @A00475:502:HJLHHDRX2:1:2101:1199:1000:NCCCATTCAG and 2:N:0:ATTACTCG+AGGCTATArbc: is gone

input to ultraplex:

(base) [hsher@tscc-login1 fastqs]$ zcat all.Tr.umi.fq2.trim.gz | head
@A00475:502:HJLHHDRX2:1:2101:1090:1000:NCACCACCTG 2:N:0:ATTACTCG+AGGCTATA
CGTAGTAAACTCTCCCCGGGGCTCCCGCCGGCTTCTCCGGG
+
FFFFFFFF:FFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions