Whole genome sequencing
Eleven μL of RNA were used as template for reverse transcription using
Invitrogen SuperScript IV reverse transcriptase (ThermoFisher
Scientific, Massachusetts, USA) and random hexamers (ThermoFisher
Scientific, Massachusetts, USA). Whole genome amplification of the
coronavirus was done with an Artic_nCov-2019_V3 panel of primers
(Integrated DNA Technologies, Inc., Coralville, Iowa, USA)
(artic.network/ncov-2019) and the Q5 Hot Start DNA polymerase (New
England Biolabs, Ipswich, Massachusetts, USA). Libraries were prepared
using the Nextera Flex DNA Library Preparation Kit (Illumina lnc,
California, USA) following manufacturer´s instructions.
Libraries were quantified with the Quantus™ Fluorometer (Promega,
Wisconsin, USA), before being pooled at equimolar concentrations (4 nM).
Next, they were sequenced in pools of up to 17 libraries on the Miseq
system (Illumina Inc, California, USA) and the MiSeq Reagent Micro kit
v2 (2x151pb) or in pools of up to 96 libraries with the MiSeq Reagent
(2x201 pb).
FastQ files above the GISAID thresholds were deposited at GISAID
EPI_ISL_654287, EPI_ISL_654203, EPI_ISL_654284, EPI_ISL_654176
and EPI_ISL_1173765. An in-house analysis pipeline was applied to
analyse the sequencing reads. The pipeline can be accessed at
https://github.com/pedroscampoy/covid_multianalysis. Briefly, the
pipeline goes through the following steps: 1) removal of human reads
with Kraken
[https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r46];
2) pre-processing and quality assessment of fastq files using fastp
[https://academic.oup.com/bioinformatics/article/34/17/i884/5093234]
v0.20.1 (arguments: –cut tail, –cut-window-size,
–cut-mean-quality , -max_len1 ,-max_len2 ) and fastQC v0.11.9
[Andrews S.; S Bittencourt a, “FastQC: a quality control tool for
high throughput sequence data – ScienceOpen,” Babraham Inst., p.
http://www.bioinformatics.babraham.ac.uk/projects/, 2010.]; 3) mapping
with bwa v0.7.17 [H. Li and R. Durbin, “Fast and accurate short read
alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no.
14, pp. 1754–1760, 2009.] and variant calling using IVAR v1.2.3
[https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1618-7]
using Wuhan-1 sequence (NC_045512.2) as reference; 4) Recalibration of
punctual low coverage positions using joint variant calling. When
necessary, informative non-covered positions were analysed by standard
Sanger sequencing with the corresponding flanking primers from the ARTIC
set.