Genome size estimation based on NGS sequencing data
The HTQC package(Xi Yang et al., 2013) was used to filter low-quality bases and reads. Briefly, three steps were performed to clean the NGS data. First, the adapter sequences were removed from the reads; second, the reads with more than 10% N bases were eliminated; and third, reads with more than 50% low-quality bases (<=5) were discarded. Lastly, we obtained 42.3 Gb (~86X) of cleaned data for the Kmer-based analysis. We also randomly picked 10,000 read pairs and blasted them against the NCBI non redundant nucleotide (nt) database to check for obvious sample contamination.