ABSTRACT
The CRISPR-Cas system of
Prokaryotes is an adaptative immune defense mechanism to protect
themselves from invading genetic elements (e.g. phages and plasmids).
Studies that describe the genetic organization of these prokaryotic
systems have mainly reported on the Enterobacteriaceae family
(now reorganized within the order Enterobacteriales ). For some
genera, data on CRISPR-Cas systems remain poor, as in the case ofSerratia (now part of the Yersiniaceae family) where data
are limited to a few genomes of the species marcescens . This
study describes the detection, in silico , of CRISPR loci
in 146 Serratia complete genomes and 336 high-quality assemblies
available for the species ficaria , fonticola ,grimesii , inhibens, liquefaciens , marcescens ,nematodiphila , odorifera , oryzae, plymuthica ,proteomaculans , quinivorans , rubidaea ,symbiotica, and ureilytica . Apart from subtypes I-E and
I-F1 which had previously been identified in marcescens , we
report that of I-C and the variants I-ES1, I-ES2 and I-F1S1. Analysis of
the genomic contexts for CRISPR loci revealed mdtN -phnP as
the region mostly shared (grimesii , inhibens ,marcescens , nematodiphila , plymuthica ,rubidaea, and Serratia sp.). Three new contexts detected
in genomes of rubidaea and fonticola (puugenes-mnmA ) and rubidaea (osmE -soxG andampC -yebZ ) were also found. Plasmid and/or phage origin of
spacers was also established.
INTRODUCTION
The prokaryotic system CRISPR-Cas
(Clustered Regularly Interspaced
Short Palindromic Repeats and CRISPR-associated proteins) is a defense
mechanism for bacteria and archaea against
the invasion of bacteriophages
and selfish genetic elements such as plasmids. Since their discovery
around 15 years ago [1-3], CRISPR-Cas systems have been the object
of many studies and functions, other than adaptative immunity, as
regulation of bacteria virulence and stress response have been reported
[4,5]. Based on a census of complete genomes (CGs), it is now
reckoned that these systems are distributed mainly in archaea
(~82,5%) and, to a lesser extent, bacteria
(~40%) [6]. CRISPR-Cas systems are composed of
CRISPR arrays and adjacent CRISPR-associated (cas ) genes. The
former is composed of direct repeats interspaced by spacers; the latter
coding for proteins involved in the immune response and DNA repair. This
ever-expanding knowledge of the composition and architecture ofcas gene clusters has led to an updated classification of
CRISPR-Cas systems where two classes, six types, and various subtypes
(some of which are further divided into different variants) are now
reported [6,7]. Class 1 includes the types I, III, and IV, which are
divided into seven subtypes I (I-A to I-G), six subtypes III (III-A to
III-F), and three subtypes IV (IV-A to IV-C), respectively. Class 2
includes the types II, V, and VI; they are also divided into subtypes:
three subtypes II (II-A to II-C), eleven subtypes V (V-A to V-K and
V-U), and four subtypes VI (VI-A to VI-D), respectively. While Class 2
is found mainly in Bacteria, Class 1 is present both in Bacteria and
Archaea. Studies on CRISPR-Cas systems have been performed on genomes of
different bacteria families, with that of the Enterobacteriaceaebeing one of the most investigated [8-10]. This family was unique in
the Enterobacterales order until 2016 when Adeolu and colleagues
[11] reclassified the order by adding six new families
(Budviciaceae, Erwiniaceae, Hafniaceae, Morganellaceae,
Pectobacteriaceae, Yersiniaceae ). Despite this reclassification, data
on CRISPR-Cas systems remain mainly limited to genera of theEnterobacteriaceae family [12-15].
The genus Serratia , a Gram-negative rod, is now part of the
family Yersiniaceae . Serratia species can be found in
different environments (e.g. water, soil) and hosts (e.g. humans,
insects, plants, vertebrates) where they may play different roles
ranging from opportunistic pathogens to symbionts [16-18]. AmongSerratia species, marcescens is undoubtedly the most
studied mainly for its role played as a symbiont associated with insects
and nematodes [19] or as a human opportunistic pathogen (currently
reported as one of the most important bacteria responsible for acquired
hospital infections) [20]. A growing number of marcescensgenomes have then been sequenced with a pangenome allele database
available for different studies ranging from virulence, and antibiotic
resistance to the identification of CRISPR systems [21]. Several
studies, additionally to marcescens , have also been reported for
other Serratia species that play different roles in human and
insect pathogenesis[22]. Although the CRISPR systems represent a
valuable substrate for diagnostic, epidemiologic, and evolutionary
analyses [4], data on CRISPR-Cas systems in the genus are scarce and
limited to the detection of subtypes I-E and I-F1 in genomes of the
species marcescens [9,23-25].
In this study, 146 Serratia CGs and 336
High-Quality Assemblies
(HQAs) were available for the
species ficaria , fonticola , grimesii ,inhibens , liquefaciens , marcescens ,nematodiphila , odorifera , oryzae, plymuthica ,proteomaculans , quinivorans , rubidaea ,symbiotica, and ureilytica were explored for the presence
and type of cas gene clusters and/or CRISPR arrays. Apart from
subtypes I-E and I-F1, the results presented in this study show the
presence (first detected in Serratia ) of subtype I-C and that of
the variants I-ES1, I-ES2, and I-F1S1. Moreover, this study extends the
previously reported mdtN -phnP CRISPR-genomic context,
identified in marcescens , to the species grimesii ,inhibens , nematodiphila , plymuthica , andrubidaea , reporting three new possible shared contexts. One,puu genes-mnmA , was detected in genomes of rubidaeaand fonticola , and two (osmE -soxG andampC -yebZ ) in genomes of rubidaea . Spacers’ content
was also assessed to establish the plasmid and/or phage origin of the
matched protospacers. The discovery of CRISPR-Cas systems has allowed
the development of new technology tools in the bioengineering field
[26]. A clear example is represented by gene editing strategies
based on CRISPR/Cas9 technique successfully used in agriculture,
nutrition, and human health [27]. The development of new
CRISPR-based applications also relies on the continuous update of CRISPR
systems data and knowledge. Our study, in providing more comprehensive
data on CRISPR in Serratia , has undoubtedly contributed to an
expanded knowledge of these systems.
MATERIALS AND METHODS
Genomes analyzed
One hundred and forty-six Serratia CGs were considered in this
study. The set of genomes encompasses the 15 S. marcescens CGs we
previously analyzed [25] and those of the genus Serratiaavailable at the CRISPR-Cas++ database
(https://crisprcas.i2bc.paris-saclay.fr/MainDb/StrainList) up to
12/12/2020 [28,29] (Table S1). Among genome sequences available at
the assembly level of scaffolds or contigs available at the
National Center for Biotechnology
Information database (NCBI)
(https://www.ncbi.nlm.nih.gov/assembly) up to 12/12/2020, we selected
the HQAs (N50>50kb). Species attribution and strain details
(name, place, date of isolation) were recovered (when available) from
GenBank or related articles. Serratia strains FGI94 (NC_020064),
FS14 (NZ_CP005927), SCBI (NZ_CP003424), YD25 (NZ_CP016948), and
DSM21420 (GCA_000738675) were reclassified as reported by
Sandner-Miranda et al ., 2018 [30]. We also included sequences
with the accessions MK507743, MK507744, MK507745, and MK507746 referring
to contigs (N50 ranging from 228817 and 291462) harboring CRISPR loci in
genome assemblies (unpublished) of 4 S. marcescens strains
reported as secondary symbionts in the
Red Palm Weevil
(RPW) Rhynchophorus
ferrugineus (Olivier, 1790) (Coleoptera: Curculionidae) [25,31]
(Supporting Table S1), an alien invasive pest now threatening South
America [32].
Detection of CRISPR-Cas loci.
Details about the detection of the cas gene cluster and/or CRISPR
array(s) for CGs were retrieved from the CRISPR-Cas++ database. CRISPR
array(s) recorded by CRISPR-Cas++ were assigned to levels 1 to 4 based
on criteria required to select the minimal structure of putative CRISPR
as reported by Pourcel et al . [28]. Level 1 is the lowest
level of confidence. Levels 2 to 4 were assigned based on the
conservation of repeats (which must be high in a real CRISPR array) and
on the similarity of spacers (it must be low). Level 4 CRISPRs were
defined as the most reliable ones. Levels 1 to 3 may correspond to false
CRISPRs. In our study, only CRISPR array(s) recorded with level 4, were
considered. CRISPR arrays without a complete set of cas genes in
the host genome were defined as “orphans”. Genomes harboringcas gene clusters were then submitted to the CRISPRone analysis
suite (http://omics.informatics.indiana.edu/CRISPRone/) [33] to
graphically visualize the architecture of each cluster. The same suite
was used to search and visualize cas gene clusters in the HQAs. A
subtype of cas gene clusters was assigned according to the recent
classification update for CRISPR-Cas systems [6].
In silico analyses of consensus of direct repeats.
A consensus of Direct Repeats
(CDRs) from CRISPR arrays was
clustered by BLAST similarity. Some CDRs were manually trimmed when just
a few terminal nucleotides were the only difference from the other
members of the same cluster. The CDRs were used as input for CRISPRBank
(http://crispr.otago.ac.nz/CRISPRBank/index.html) and CRISPR-Cas++ to
assign, based on identity with known CDRs [28,29,34], a specific CDR
type to CRISPR arrays. CRISPR arrays whose CDR type was consistent with
the subtype of the cas gene set harbored in the same genome were
defined as “canonical”. While those not consistent with the subtype of
the cas gene set harbored in the same genome were defined as
“alien”. A schematic diagram of an alien, canonical and orphan array
is shown in Figure 1. CDRs and the number of repeats of the CRISPR
arrays in the HQAs of Serratia sp strains DD3, Ag1, and Ag2 were
recovered from the CRISPRone output. Spacers’ analysis for duplications
(spacers of Ag1, Ag2, and DD3 included) was performed through the
CRISPRCasdb spacer database at the CRISPRCas++ site
(https://crisprcas.i2bc.paris-saclay.fr/MainDbQry/Index). Phagic
and/or plasmidic origin of matching protospacers were searched at the
CRISPRTarget site
(http://crispr.otago.ac.nz/CRISPRTarget/crispr_analysis.html)
[34].
Genomic contexts of CRISPR positive genomes.
Analysis of CRISPR positive CGs and HQAs was performed to better
characterize the genomic context surrounding the cas gene set(s)
and/or CRISPR array(s). HQAs with at least 4kb flanking the casgene set(s) were considered. These regions were annotated by Prokka
(https://github.com/tseemann/prokka) [35]. Synteny was established
by either the Mauve algorithm (http://darlinglab.org/mauve/mauve.html)
[36] or visual inspection of annotated proteins.
Phylogenetic analyses.
The evolutionary relationship of Serratia strains found positive
for cas genes set(s) was established and graphically depicted by
the Cas3 sequence tree. All the protein sequences were aligned by the
MUSCLE algorithm (https://www.ebi.ac.uk/Tools/msa/muscle/) [37,38].
The 16S rRNA gene tree was also drawn for comparison. Dendrograms were
generated by the Neighbour-Joining clustering method and average
distance trees with JalView (https://www.jalview.org/) [39]. For the
16S rRNA gene tree, the multiple sequence alignment was obtained by
retrieving from 1 to 7 full gene sequences (CGs) or truncated 16S rRNA
gene sequences (HQAs). A phylogenetic tree was obtained by multiple
alignments of all retrieved 16S rRNA genes; an abbreviated tree was
constructed by using one sequence from each genome.
RESULTS
CRISPR positive genomes.
A collection of 146 Serratia CGs was explored for the presence ofcas gene cluster and/or CRISPR array(s). Most of the genomes
(134) were reported as known species: ficaria (1),fonticola (7), grimesii (1), inhibens (1),liquefaciens (7), marcescens (87), nematodiphila(1), plymuthica (11), proteomaculans (2),quinivorans (2), rubidaea (8), symbiotica (4),ureilytica (2). The remaining 12 genomes were of unidentified
species and, from here on, they will be referred to as Serratiasp. (Supporting Table S1). cas gene cluster and/or CRISPR
array(s) were detected in 35 CGs (24%) of which 17 harbored a singlecas gene cluster associated with one or more arrays, while 18
harbored orphan array(s). Some CGs records were assigned to the same
genome being characterized by the same cas gene set subtype and
identical numbers of both CRISPR arrays and spacers (Table 1). All
detected cas gene clusters were of Class 1. Nine were canonical
and distributed as follows: 2 subtypes I-C (rubidaea ) (Figure
2A), 1 I-E (plymuthica ), and 6 I-F1 (1 fonticola , 3marcescens , 1 inhibens, and 1 rubidaea ) (Figures 2B
and 2C). The remaining 8 clusters were found atypical and assigned, in
this study, to I-ES1 (3 marcescens and 1 plymuthica ) and
I-F1S1 (1 marcescens , 2 rubidaea, and 1 Serratiasp.) as variants of subtypes I-E and I-F1, respectively.
The variant I-ES1 had the cas3-cas8e genes spaced by
~600 nt while the variant I-F1S1 had thecas3 -cas8f1 genes separated from each other by
~400 nt (Figures 1B and 1C). Since the I-ES1 and I-F1S1cas gene clusters have never been reported in Serratia ,
their presence was further explored among 336 Serratia HQAs. The
assemblies were distributed as follows: ficaria (1),fonticola (6), grimesii (2), liquefaciens (3),marcescens (295), nematodiphila (2), odorifera (2),oryzae (1), plymuthica (4), proteomaculas (1),rubidaea (2), symbiotica (1), ureilytica (1) andSerratia sp (15) (Supporting Table S1). Of the 336 analyzed
genomes, 46 (13.7%) were positive for the presence of cas gene
clusters. Twenty-six were subtype I-F1 (21 marcescens , 1fonticola, and 4 Serratia sp.) (Figure 1C), 2 subtype I-C
(rubidaea ) (Figure 1A), and 3 subtype I-E (marcescens )
(Figure 2B) (Supporting Table S2). The variant I-ES1 was detected in 2
genomes of marcescens , the I-F1S1 in 8 genomes ofmarcescens, and 1 of grimesii . In 3 genomes ofSerratia sp. (strains Ag1, Ag2, and DD3) an additional variant of
the subtype I-E, here named I-ES2, was detected (Figure 2B). The variant
I-ES2 was characterized by the translocation of cas6e betweencas7 and cas11 , and the presence (upstream of cas3 )
of a gene harboring the WYL domain and encoding for a potential
functional partner of the CARF (CRISPR–Cas Associated Rossmann Fold)
superfamily proteins [6]. Proteins containing the WYL domain (name
standing for the three conserved amino acids tryptophan, tyrosine, and
leucine, respectively) have been reported for subtypes I-D and VI-D
[40,41]. The distribution of CRISPR-positive genomes, over the total
analyzed among Serratia species, is shown in Figure 3.
Coexistence in the same genome of different sets of cas genes was
also detected: subtypes I-E and I-F1 were found in the single HQA oforyzae , while subtypes I-ES2 and I-F1were detected in 2 HQAs ofSerratia sp (strains Ag1 and Ag2) (Supporting Table S2).
CDRs and spacers.
The 35 CRISPR-positive CGs harbored 78 CRISPR arrays of which 48 were
canonical. The latter were distributed as follows: fonticola (4),inhibens (1), marcescens (19), plymuthica (5),rubidaea (15), and Serratia sp (4). Twenty-three arrays
were orphan and detected in genomes of marcescens (8),plymuthica (4), symbiotica (1), nematodiphila (1),rubidaea (5), and Serratia sp (4) (Table 1 and Figure 1).
Alien arrays (8) were only detected in the species rubidaea . For
a comprehensive analysis, arrays in the 3 HQAs Ag1, Ag2, and DD3 were
included (Supporting Table S2). All disclosed CRISPR arrays were
assigned, by comparative sequence analyses, to CDR types I-C, I-E, or
I-F (Table 1). The association between CDR types and cas gene
sets (canonical and variant) is reported in Table 2. Based on their
nucleotide identity, the CDRs identified for subtype I-E and variants
I-ES1 and I-ES2 could be arranged into two clusters named CDR-I and
CDR-II. CDR-I was composed of 6 CDRs (identity from 83 to 96%) and
linked to the cas gene sets I-E and I-ES1. CDR-II was composed of
2 CDRs (identity of about 96%) and linked to the cas gene set
I-ES2. When the CDRs of the two clusters were compared to each other,
the nucleotide identity dropped to 55-62%.
The architecture of the cas gene set I-ES2 has previously been
reported as I-E* for Klebsiella and I-E variant for Vibrio
cholerae [14,42]. We then compared the CDRs sequences I-E* and I-E
variant with those of CDR-II and the identity was found between 82 to
96%. This association has further been confirmed by results obtained
from the analysis of the cas gene clusters identified in 99
genomes retrieved from CRISPRBank and by searching for the presence of
CDRs I-ES2. Results showed that 95 of these genomes had a casgene architecture identical to that of I-ES2. The remaining 4 genomes
harbored a truncated set of cas genes. The overall of these data
linked specifically CDR-II to the cas gene set I-ES2.
A total of 1391 spacers were identified. Identical arrays were shared byrubidaea strains FDAARGOS_926 and NCTC12971. Likewise, different
sets of identical arrays were shared by plymuthica strains AS9,
AS12, and AS13; marcescens strains KS10 and EL1;marcescens strains CAV1761 and CAV1492 (Supporting Table S3).
These findings confirmed multiple records of the same genome for each
group of strains and the total number of spacers was estimated at 1290
of which 1219 were unique and 330 matched protospacers with the
following origin: 131 phage, 132 plasmid, and 67 phage/plasmid
(Supporting Table S3).
Phylogenetic trees.
The phylogenetic tree generated by multiple alignments of the amino acid
sequences of Cas3 showed a clusterization of the subtypes I-C, I-E, and
I-F1 into 3 distinct branches (Figure 4). The variants I-ES1 and I-F1S1
were randomly distributed among the I-E and I-F1 respectively, while the
variant I-ES2 appears to group within a sub-lineage of I-E. Within the
I-C, I-E, and I-F1 branches, strains belong to the same group of
species. A phylogenetic tree based on multiple alignments of the 16S
rRNA gene sequences was generated for comparison (Figure 5 and
Supporting Figure S1). The 16S rRNA gene trees showed, as expected,
nesting of the strains belonging to the same species. The phylogenetic
distribution of Serratia species in the Cas3 tree may suggest a
possible independent intra-species evolutionary pathway. However, being
that the number of available CRISPR-positive genomes is too low for mostSerratia species such a hypothesis needs to be validated by
future studies. The position of strains TEL in the clustermarcescens and JUb9 in the cluster rubidaea shown in the
Cas3 phylogenetic tree was confirmed by the 16S rRNA gene tree, which
might suggest a species assignment for these strains.
CRISPR genomic contexts.
The 35 CRISPR-positive CGs and 28 of the 46 CRISPR-positive HQAs were
analyzed to identify the possible shared genomic context(s). Eight
different genomic contexts, named from A to H, were identified. Contexts
A to D (Figure 6) were shared by different genomes, while those from E
to H were identified in single genomes. The genomic context A
(mdtN -phnP ) has previously been described in S .marcescens strains isolated as a secondary symbiont of RPW and in
other marcescens CGs available in the NCBI database [25]
becoming the most commonly shared in this study being identified in 55
genomes distributed as follows: 35 marcescens , 1 grimesii ,
1 inhibens , 1 nematodiphila , 6 plymuthica, 6rubidaea, and 5 Serratia sp. Contexts B (puugenes-mnmA ), C (osmE -soxG ), and D
(ampC -yebZ ) were shared by 11, 4, and 6 genomes
respectively; context B by genomes of species fonticola (2),rubidaea (7) and Serratia sp. (2); C and D only byrubidaea genomes. For context D, assignment to rubidaeawas assumed for the strain JUb9 (see above). The contexts E
(nrdG -bglH ) and F (sucD -vasK ) were both
identified in the single genome of S . oryzae strain J11-6;
while G (gntR -cda ) and H (gutQ -queA ) in
genomes of the Serratia sp. Ag1 and S . symbioticaCWBI-2.3, respectively (Table 3). Distribution of the genomic contexts
by subtypes of cas gene sets and/or CDR types is reported in the
supporting Table S4. Genomes of species rubidaea were
characterized by the presence of multiple CRISPR contexts (A, B, C, D)
with the context C associated with the cas gene set of subtype
I-C.
DISCUSSION
Bacteria of the genus Serratia are ubiquitous and have been
isolated from soil, water, plant roots, insects, and the
gastrointestinal tract of animals [16-18]. This broad range of
environments exposes Serratia strains to exogenous genetic
elements such as plasmids, phages, and chromosomal fragments of other
bacteria. Some of them may represent a life threat (e.g. phages) or a
metabolic burden (e.g. plasmids). To overcome this, defense mechanisms
such as CRISPR-Cas have been developed during bacterial evolution.
Studying the presence/absence of CRISPR-Cas systems and their features
in different genera of families is a relatively new scientific approach
of investigation to gain data on the evolution of these systems and
their role played during bacteria lifetime [43]. The average
percentage of CRISPR distribution among Bacteria is the outcome of
processes and/or factors that play different ecological roles within a
genus/species. Among these processes/factors noteworthy is the balance
between protection provided by CRISPR systems and their possible
deleterious effects (e.g. self-targeting spacers), the role played by
exogenous genetic elements (e.g. plasmids, phages, etc.) in bacteria
evolution, and the horizontal transfer of CRISPR systems.
Data on CRISPR loci in Serratia are limited to CGs of S .marcescens strains [9,23-25]. In the present study, along
with the species marcescens , we extended data on CRISPR loci to
14 additional Serratia species. CRISPRs were detected in 24% of
the CGs and about 14% of the HQAs analyzed. The percentage of detection
is lower than that reported for Bacteria (about 40%) [6]. However,
whether the lower percentage of detection in Serratia reflects a
distinguishing feature of the genus (particularly for the most
representative analyzed marcescens species where the percentage
was 12.6%) or a misrepresentative distribution of the available genomes
in databases, remains to be established.
Most of the loci identified in this study were located within the
genomic context mdtN -phnP previously reported in the
species marcescens and now further extended to those ofgrimesii , inhibens , nematodiphila ,plymuthica , and rubidaea . Three new possible contexts were
also identified: one (puu genes-mnmA ) shared by genomes ofrubidaea and fonticola ; and two (osmE -soxGand ampC -yebZ ) detected in those of rubidaea . The
context osmE – soxG might be closely linked to thecas gene set of subtype I-C (Supporting Table S4). Due to the low
number of CRISPR-positive genomes of rubidaea andfonticola and genomes positive for the cas gene set I-C,
further analyses are required to confirm this hypothesis.
A previous comprehensive study on the distribution of CRISPR-Cas systems
in genomes of the Enterobacteriaceae family (now reorganized
within the Enterobacteriales order) showed the predominant
presence of subtype I-E and the rare coexistence of subtypes I-E and
I-F1 in the same genome [9]. Our data show the prevalence of subtype
I-F1 (39,5%), followed by subtypes I-E (about 5%) and I-C (about 5%).
Detection of subtype I-C is, to the best of our knowledge, the first
report in Serratia . The prevalence of the subtype I-F1 in our
subset of CRISPR-positive genomes is consistent with both the new
reorganized Enterobacteriales order [11] and data produced by
Medina-Aparicio et al . [9]. Indeed, in the aforementioned
study subtype I-F1 was found prevalent in genera Yersinia ,Rahnella, and Serratia which are now part of the newYersiniaceae family. On the other hand, the subtype I-E remains
predominant within the Enterobacteriaceae family. Moreover, the
finding of two distinct cas -gene sets (I-E/I-F1 or I-ES2/I-F1) in
only 3 Serratia genomes, confirms that the coexistence of these
subtypes is not frequent.
Six different cas -gene set architectures were identified of which
those reported as I-ES1 (characterized by a 0.6kbcas3 /cas8e intergenic sequence), I-ES2 (characterized by
the cas6e translocation between cas7 and cas11 ),
and I-F1S1 (characterized by 0.4kb cas3 /cas8f1 intergenic
sequence) are, to the best of our knowledge, the first ever detected inSerratia . Similar or identical architectures of I-ES1, I-ES2, and
I-F1S1 have been reported for other bacteria genera: a similar
architecture to I-ES1 has been described in Escherichia coli(IGLB fragment) where the cas3 /cas8e intergenic sequence
was ~0,4kb [44,45]; an identical architecture of
I-ES2 has already been detected in Klebsiella (I-E*) andVibrio (I-E variant) strains [14,42]; a similar architecture
to I-F1S1 was reported in V. cholerae (I-FV1), where thecas3 /cas8f1 intergenic sequence was ~0.1kb
[42].
This study also supplies data on the presence/number of CRISPR arrays
and their CDRs sequences in Serratia . Apart from canonical arrays
(61.5% of the total disclosed arrays), orphans (29.4%) and aliens
(10.2%) arrays were also detected (Table 1 and Figure 1). Orphan arrays
might represent remnants of previous complete CRISPR-Cas systems
[33]. The presence of alien arrays found only in rubidaea CGs
is, as far as we know, the first report in bacteria CRISPR-positive
genomes. Its detection might be explained as traces of ancient complete
CRISPR-Cas systems I-E/I-F1 or I-C/I-E/I-F1 coexistent within the same
genome (Table 1). Alternatively, the aliens might result from single
horizontal gene transfer events. Further analyses could unveil their
genetic origin and the entity of their distribution among
CRISPR-positive bacteria genomes. Detection of more alien arrays might
unveil that the presence of multiple subtypes in a genome is more
frequent than has been reported so far. Furthermore, CDRs specifically
associated with the cas gene set variant I-ES2 were also first
described (Table 2).
Finally, the phylogenetic tree generated by multiple alignments of the
Cas3 sequences showed a potential sub-lineage (variant I-ES2) within the
I-E branch and thus might represent and/or anticipate a distinct clonal
expansion of an I-E sub-population (Figure 4).
Knowledge of CRISPR-Cas systems is constantly expanding due to
studies on newly available genomic sequences or genomic sequences not
yet explored. The CRISPR-Cas systems classification is thus continuously
updating also in the light of their possible applications. Indeed, the
CRISPR-Cas technology has undoubtedly revolutionized systems of genome
editing with a wide range of potential industrial and biomedical
applications. Other, more recent genome-editing tools are based on
methods that make use of the Cas9 protein [46]. However, the
expression of foreign proteins with DNA-binding and editing activity
appears toxic for many bacteria. Harness of endogenous CRISPR systems is
a recent and promising new line of approach for bacteria genome editing
[47,48].
Our study has contributed to expanding knowledge on the variability and
distribution of CRISPR systems in the Serratia genus. Data here
presented might be exploitable for native CRISPR effectors of this genus
that includes species (e.g. marcescens ) relevant in environmental
and clinical fields. Moreover, detection of the same subtype ofcas -gene sets in different Serratia species and other
genera highlights the open question on the molecular mechanism(s) yet to
be identified that have been allowed intra- and inter-species spread.