Whole genome sequencing
Eleven μL of RNA were used as template for reverse transcription using Invitrogen SuperScript IV reverse transcriptase (ThermoFisher Scientific, Massachusetts, USA) and random hexamers (ThermoFisher Scientific, Massachusetts, USA). Whole genome amplification of the coronavirus was done with an Artic_nCov-2019_V3 panel of primers (Integrated DNA Technologies, Inc., Coralville, Iowa, USA) (artic.network/ncov-2019) and the Q5 Hot Start DNA polymerase (New England Biolabs, Ipswich, Massachusetts, USA). Libraries were prepared using the Nextera Flex DNA Library Preparation Kit (Illumina lnc, California, USA) following manufacturer´s instructions.
Libraries were quantified with the Quantus™ Fluorometer (Promega, Wisconsin, USA), before being pooled at equimolar concentrations (4 nM). Next, they were sequenced in pools of up to 17 libraries on the Miseq system (Illumina Inc, California, USA) and the MiSeq Reagent Micro kit v2 (2x151pb) or in pools of up to 96 libraries with the MiSeq Reagent (2x201 pb).
FastQ files above the GISAID thresholds were deposited at GISAID EPI_ISL_654287, EPI_ISL_654203, EPI_ISL_654284, EPI_ISL_654176 and EPI_ISL_1173765. An in-house analysis pipeline was applied to analyse the sequencing reads. The pipeline can be accessed at https://github.com/pedroscampoy/covid_multianalysis. Briefly, the pipeline goes through the following steps: 1) removal of human reads with Kraken [https://genomebiology.biomedcentral.com/articles/10.1186/gb-2014-15-3-r46]; 2) pre-processing and quality assessment of fastq files using fastp [https://academic.oup.com/bioinformatics/article/34/17/i884/5093234] v0.20.1 (arguments: –cut tail, –cut-window-size, –cut-mean-quality , -max_len1 ,-max_len2 ) and fastQC v0.11.9 [Andrews S.; S Bittencourt a, “FastQC: a quality control tool for high throughput sequence data – ScienceOpen,” Babraham Inst., p. http://www.bioinformatics.babraham.ac.uk/projects/, 2010.]; 3) mapping with bwa v0.7.17 [H. Li and R. Durbin, “Fast and accurate short read alignment with Burrows-Wheeler transform,” Bioinformatics, vol. 25, no. 14, pp. 1754–1760, 2009.] and variant calling using IVAR v1.2.3 [https://genomebiology.biomedcentral.com/articles/10.1186/s13059-018-1618-7] using Wuhan-1 sequence (NC_045512.2) as reference; 4) Recalibration of punctual low coverage positions using joint variant calling. When necessary, informative non-covered positions were analysed by standard Sanger sequencing with the corresponding flanking primers from the ARTIC set.