Data pre-processing

sciReptor pipeline expects to ingest reads that cover the complete amplicon. However, Illumina as the most frequently used NGS platform cannot deliver reads that would provide full-length coverage of Ig/TCR V regions. Illumina MiSeq 2x300 bp covers this physically, but the pair-reads need to be assembled first, before they can be used by the pipeline. These pre-processing step are described below.

Requirements

The preprocessing described below assumes that the following tools are install on your computer and are in the PATH.

  • pandaseq

  • bbmap

It further assumes that the sequence data has undergone basic tests for integrity and experimental quality controls (e.g., with fastqc). The median quality for the first read of a read pair should be above Q20 for the entire length, for the second read it should not drop below Q20 before 250 bp.

Paired-read assembly

PandaSeq settings (NULL indicates that this option is not set in the commandline)

species

locus

length_max

length_min

overlap_min

threshold_overlap

reference

mouse

A

560

250

50

0.8

[Ludwig_2019]

mouse

B

560

250

50

0.8

[Ludwig_2019]

mouse

H

NULL

300

50

0.8

[Busse_2014]

mouse

K

NULL

300

50

0.8

[Busse_2014]

mouse

L

NULL

300

50

0.8

[Busse_2014]

human

A

550

300

50

0.8

[Wahl_2021]

human

B

550

300

50

0.8

[Wahl_2021]

human

H

550

320

NULL

NULL

[Murugan_2015]

human

K

550

320

NULL

NULL

[Murugan_2015]

human

L

550

320

NULL

NULL

[Murugan_2015]