Materials and Methods
Alate queen (gyne) collection. Young Solenopsis invicta gynes exhibit distinct rates of sexual and reproductive maturation depending on social chromosome genotype (Nipitwattanaphon, Wang, Dijkstra, & Keller, 2013); these gynes do not attempt to leave their natal nest on mating flights until they are fully sexually mature. We collected gynes embarking on mating flights in the field to achieve physiological matching across genotypes, colonies, and forms based on maturity rather than strict age-matching (Figure 1), although, based on the colony phenologies and weather conditions at the collection locale in spring of 2015 in Athens-Clarke Co., Georgia, USA, it is doubtful that any sampled gynes differed by more than a couple of weeks in age from one another. Alate (winged) gynes were aspirated from the tops of mounds and frozen on dry ice in the field in and around Athens-Clarke Co. on days of mating flights in April 2015. Alate gynes were collected from polygyne nests at three sites (four nests from 33°55’18”N 83°21’2”W, three nests from 33°54’48”N 83°20’46”W, and one nest from 33°54’36”N 83°20’6”W), and from eight monogyne nests at one site (33°42’17”N 83°16’53”W). All samples were stored at -80°C pending further processing. We processed a total of eight gynes from as many monogyne colonies and three gynes each (one of each genotype) from eight polygyne colonies.
Sample genotyping. RNAlater -ICE (Thermo Fisher Scientific, MA) was added to tubes holding the frozen gynes, which were submerged for >24 hours at -20°C. Gynes from polygyne colonies then were sorted by size in a Petri dish on top of dry ice. The smallest gynes were selected as prospective Sb/Sb , medium asSB/Sb , and the largest as SB/SB , based on prior data on the association of gyne mass with Gp-9 genotype (C. DeHeer et al., 1999). Legs were removed for DNA extraction and genotype confirmation, then bodies were stored at -80°C until dissection and RNA extraction. DNA was extracted using a Puregene DNA isolation kit (Qiagen, Valencia, CA). To obtain the social chromosome genotype of each gyne from presumed polygyne colonies, a Gp-9 PCR assay was performed (Valles & Porter, 2003). The presence of gynes withSB/Sb and Sb/Sb genotypes also confirmed colony social form of polygyne colonies because only polygyne colonies contain individuals with the Sb haplotype (for which theGp-9b allele is fully diagnostic). To confirm social form of the monogyne colonies from which gynes were sampled, 15-20 individuals (mix of workers and alate gynes) were collected from each colony and the same Gp-9 PCR assay was performed. All gynes sampled from polygyne colonies were genotyped at 9-13 previously described polymorphic microsatellite loci (Ascunce, Bouwma, & Shoemaker, 2009). These microsatellite data revealed that no sampled individuals were triploid (Krieger, Ross, Chang, & Keller, 1999) and that no sampled nestmates were siblings (Supplementary Data 1).
Organ dissection. Gynes were dissected under an Olympus SZ61 stereomicroscope. After making an opening in the cuticle on the back of the head, a minutin probe was used to free the brain from the head capsule. The majority of tracheae and glands were removed from each brain; however, some traces of tracheae and possibly residual gland material remained. The ovaries were extricated from the hindgut, and any Malpighian tubules present were removed, along with some excess fat body. Due to the extensive tracheal intrusion in the ovaries, removing all traces of tracheae and fat body from the ovaries was not feasible. Tissues were stored at -20°C in RNAlater -ICE until RNA extraction.
RNA extraction, library preparation, and sequencing. RNA was extracted from brains using the RNeasy Micro Kit and from ovaries using the RNeasy Mini Kit, both with DNase treatment (Qiagen, Valencia, CA). Extracted total RNA integrity and concentration were evaluated for every sample on an Agilent 2100 bioanalyzer (Supplementary Data 1). Illumina sequencing libraries were prepared following the Smart-seq2 protocol (Picelli et al., 2014), which was developed for low input RNA sequencing applications. Based on bioanalyzer readings, approximately 1.2 ng of total RNA was used to make each brain library and approximately 3.6 ng of total RNA was used to make each ovary library (Supplementary Data 1). Samples were barcoded and pooled prior to sequencing at the Georgia Genomics and Bioinformatics Core (Athens, GA) on an Illumina NextSeq instrument to produce 75 bp single-end reads. We sequenced all brain samples on one high-output flow cell and all ovaries on a second high-output flow cell.
Quality control and read alignment. Most rRNA presumably was removed during RNA-Seq library preparation by poly-A mRNA isolation in the Smart-seq2 protocol. Nonetheless, any reads in our data that aligned to Myrmecia croslandi (Australian bull ant) 18S rRNA, 5.8S rRNA, or 28S rRNA genes were removed using BLAT (version 3.5). The remaining reads were trimmed with Trimmomatic version 0.32 (TRAILING:3, LEADING:3, SLIDINGWINDOW:4:15, MINLEN:36) (Bolger, Lohse, & Usadel, 2014). Reads were aligned to S. invicta assembly gnG (NCBI accession GCF_000188075.1, generated from an SB male) using STAR version 2.5.3a (Dobin et al., 2013; Wurm et al., 2011). We utilized previously published linkage mapping information to determine which scaffolds mapped to each linkage group as well as to the Sb supergene (Pracana, Priyam, et al., 2017; Wang et al., 2013). One SB/SBpolygyne ovary sample (104DO) was removed from further analysis due to excessively low mapping (~0.9% uniquely mapped reads) (Supplementary Data 1). In addition, we resequenced two libraries that had lower than expected numbers of sequenced reads (239AB and 1EB). After alignment, we had between 10 million and 31 million uniquely mapped reads available for each sample (Supplementary Data 1).
Differential Expression Analysis. Aligned reads were mapped to the NCBI annotation release 100 RefSeq gene models using the FeatureCounts function from the RSubread package (Liao, Smyth, & Shi, 2013). Genes were filtered from the analysis if fewer than 8 libraries had a Counts-per-million (CPM) > 10/9 at that locus (more than 10 reads aligned to a gene in the majority of libraries). This decreased the possibility of inferring differential expression from structured background noise. The data were then analyzed using two statistical approaches in edgeR (Robinson, McCarthy, & Smyth, 2009) – a ‘multifactorial model’ and a ‘pairwise model’. The multifactorial-model parameters included colony identity, social form, organ type, social chromosome genotype, and a covariance term for organ and genotype. We used this model to ask broad questions about our data while making use of all of the information from all of our samples. The pairwise comparisons simply treated each sample type (e.g., polygyneSB/SB brain, polygyne SB/Sb brain, etc.) as an independent treatment for analyzing specific differences between two types. For each approach, genes were considered differentially expressed if they exhibited an FDR-corrected P-value < 0.01. Each comparison was visualized using volcano plots with a custom script using the ggplot2 R package (Villanueva & Chen, 2019).
Overlap of significant differences for various comparisons was visualized using UpSetR (Conway, Lex, & Gehlenborg, 2017). Next, candidate gene categories were further refined by classifying genes as ‘ovary-specific,’ ‘brain-specific,’ or ‘Sb/Sb -specific’ if they exhibited an FDR-corrected P-value < 0.01 in the relevant comparisons (e.g., ovary SB/SB versus SB/Sb and ovarySB/SB versus Sb/Sb in the ovary-specific category) while also exhibiting an FDR-corrected P-value > 0.1 in the other categories (e.g., brain SB/SB versus SB/Sb and brainSB/SB versus Sb/Sb in the ovary-specific category). We note that genes that exhibit brain- or ovary-specific expression differences in our study, as well as those that do not, may exhibit differential expression in unsampled tissues. Visualization of the genomic location of differentially expressed genes (DEGs) was performed using KaryoPloteR (Gel & Serra, 2017). Non-random spatial distribution of DEGs, with respect to higher or lower expression of Sb -bearing samples, was assessed using a two-sided Wald-Wolfowitz runs test (Wald & Wolfowitz, 1940) after significant DEGs were converted into binary representation based on the directionality of differential expression.
Allele-specific Expression Analysis. To perform allele-specific expression analysis, we generated a high-confidence set of haplotype-specific SNPs from our RNA-seq data following a previously described protocol (Auwera et al., 2013). In short, we performed a 2-pass alignment of the reads using STAR (Dobin et al., 2013), formatted the reads for GATK using Picard (“Picard toolkit,” 2018), then identified variants using the GATK HaplotypeCaller (Auwera et al., 2013). We leveraged our homozygous samples (SB/SB andSb/Sb ) to retain only those variants that consistently differentiated SB and Sb haplotypes. We removed Sbvariants that were found as polymorphisms in the SB population as well as those that were found in fewer than three Sb/Sb samples. A total of 3994 SNPs diagnostic for the Sb and SBhaplotypes remained following this procedure (Supplementary Data 2). We then n-masked the genome for these variants using Bedtools (Quinlan & Hall, 2010) to eliminate any alignment bias and re-aligned the heterozygous SB/Sb samples to the masked genome using the 2-pass method in STAR (Dobin et al., 2013). The GATK ASEReadCounter function was used to generate a per-SNP count matrix for SB/Sb samples (Auwera et al., 2013), which were mapped to genes using Bedtools intersect. Both the SNP-level and gene-level allele-specific expression counts were next passed to edgeR and fit to a model with variance terms for colony identity, organ, allele, and a covariance term for organ and allele effects (McCarthy, Chen, & Smyth, 2012). Genes and SNPs were considered to display significant allele-specific expression if the FDR-corrected P < 0.01. Visualization of the genomic location of SNPs displaying significant allele-specific expression was performed using KaryoPloteR (Gel & Serra, 2017). The overlap between genes displaying differential expression and allele-specific expression results was visualized using UpSetR (Conway et al., 2017).