Rna-seq contamination as a metatranscriptomic data for screening of plant pests and symbionts
- Authors: Zykin P.A.1, Andreeva E.A.1,2, Tsvetkova N.V.1, Bulanov A.N.2, Voylokov A.V.2
-
Affiliations:
- Saint Petersburg State University
- Institute of General Genetics
- Issue: Vol 23, No 3 (2025)
- Pages: 235-247
- Section: Ecosystems metagenomics
- URL: https://journal-vniispk.ru/ecolgenet/article/view/361849
- DOI: https://doi.org/10.17816/ecogen642484
- EDN: https://elibrary.ru/RMHFWD
- ID: 361849
Cite item
Abstract
Background: Transcriptome sequencing data can contain up to 30 % contaminating reads. These may originate from laboratory contamination or biologically relevant sources, amenable to metatranscriptomics analysis.
Aim: To evaluate the utility of contaminating reads for large-scale screening of plant pests and symbionts.
Methods: We analyzed the data of RNA-seq experiments of rye (Secale cereale L.) including five in-house accessions and 50 public datasets from NCBI SRA archive. Reads with good mapping to the rye genome were filtered out, retaining putative contaminats for downstream analysis.
Results: After removing laboratory contaminants, we compared aphids, symbiotic fungi, bacteria and viruses across accessions. Symbiome-derived reads were reproducible in biological replicates and varied by location, condition, and plant species, enabling post-hoc metatranscriptomic analysis.
Conclusions: Contaminating reads correlated with field-observed species or expected symbionts. Distribution patterns across accessions support repurposing existing and future sequencing data to screen for plant pests, monitor symbiotic organisms, and plan eradication strategies amid global climate change.
Keywords
Full Text
##article.viewOnOriginalSite##About the authors
Pavel A. Zykin
Saint Petersburg State University
Email: pavel.zykin@spbu.ru
ORCID iD: 0000-0003-1624-6163
SPIN-code: 2730-5890
Cand. Sci. (Biology)
Russian Federation, 7–9 Universitetskaya emb., Saint Petersburg, 199034Elena A. Andreeva
Saint Petersburg State University; Institute of General Genetics
Email: e.a.andreeva@spbu.ru
ORCID iD: 0000-0002-9326-3170
SPIN-code: 7269-8240
Cand. Sci. (Biology)
Russian Federation, 7–9 Universitetskaya emb., Saint Petersburg, 199034; 3 Gubkina st, Moscow, 117971Natalia V. Tsvetkova
Saint Petersburg State University
Email: n.tswetkowa@spbu.ru
ORCID iD: 0000-0002-7353-1107
SPIN-code: 1687-5757
Cand. Sci. (Biology)
Russian Federation, 7–9 Universitetskaya emb., Saint Petersburg, 199034Andrey N. Bulanov
Institute of General Genetics
Email: an.bulanov20002014@gmail.com
ORCID iD: 0009-0003-8092-9978
SPIN-code: 3791-9700
Russian Federation, 3 Gubkina st, Moscow, 117971
Anatoly V. Voylokov
Institute of General Genetics
Author for correspondence.
Email: av_voylokov@mail.ru
ORCID iD: 0000-0003-3423-8511
Dr. Sci. (Biology)
Russian Federation, 3 Gubkina st, Moscow, 117971References
- Sangiovanni M, Granata I, Thind AS, Guarracino MR. From trash to treasure: detecting unexpected contamination in unmapped NGS data. BMC Bioinformatics. 2019;20:168. doi: 10.1186/s12859-019-2684-x
- Simion P, Belkhir K, François C, et al. A software tool ‘CroCo’ detects pervasive cross-species contamination in next generation sequencing data. BMC Biology. 2018;16:28. doi: 10.1186/s12915-018-0486-7
- Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34(17):i884–i890. doi: 10.1093/bioinformatics/bty560
- Rabanus-Wallace MT, Hackauf B, Mascher M, et al. Chromosome-scale genome assembly provides insights into rye biology, evolution and agronomic potential. Nat Genet. 2021;53:564–573. doi: 10.1038/s41588-021-00807-0
- Bushnell B. BBMap: A fast, accurate, splice-aware aligner. USA: Department of Energy. Office of Science; 2014.
- Bushmanova E, Antipov D, Lapidus A, Prjibelski A.D. rnaSPAdes: a de novo transcriptome assembler and its application to RNA-Seq data. Gigascience. 2019;8(9):giz100. doi: 10.1093/gigascience/giz100
- NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2018;46(D1):D8–D13. doi: 10.1093/nar/gkx1095
- Wingett SW, Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res. 2018;7:1338. doi: 10.12688/f1000research.15931.2
- Lafond-Lapalme J, Duceppe M-O, Wang S, et al. A new method for decontamination of de novo transcriptomes using a hierarchical clustering algorithm. Bioinformatics. 2017;33(9):1293–1300. doi: 10.1093/bioinformatics/btw793
- Chen Y, Singh A, Kaithakottil GG, et al. An aphid RNA transcript migrates systemically within plants and is a virulence factor. PNAS. 2020;117(23):12763–12771. doi: 10.1073/pnas.1918410117
- Salter SJ, Cox MJ, Turek EM, et al. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol. 2014;12:87. doi: 10.1186/s12915-014-0087
- Berim MN. The most harmful species of aphids in the north-west of Russia. Plant protection and quarantine. 2014;(9):29–30. EDN: SJYWIX
- van Kleeff PJM, Galland M, Schuurink RC, Bleeker PM. Small RNAs from Bemisia tabaci are transferred to Solanum lycopersicum phloem during feeding. Front Plant Sci. 2016;7:1759. doi: 10.3389/fpls.2016.01759
- Su Y-L, Li J-M, Li M, et al. Transcriptomic analysis of the salivary glands of an invasive whitefly. PLoS One. 2012;7(6):e39303. doi: 10.1371/journal.pone.0039303
- Ban L, Didon A, Jonsson LMV, et al. An improved detection method for the Rhopalosiphum padi virus (RhPV) allows monitoring of its presence in aphids and movement within plants. J Virol Methods. 2003;142(1–2): 136–142. doi: 10.1016/j.jviromet.2007.01.014
- Zhao S, Ye Z, Stanton R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA. 2020;26(8): 903–909. doi: 10.1261/rna.074922.120
- Zhao Y, Li M–C, Konaté MM, et al. TPM, FPKM, or normalized counts? A comparative study of quantification measures for the analysis of RNA-seq data from the NCI patient-derived models repository. J Transl Med. 2021;19:296. doi: 10.1186/s12967-021-02936-w
- Mukherjee A, Reddy MS. Metatranscriptomics: an approach for retrieving novel eukaryotic genes from polluted and related environments. 3 Biotech. 2020;10:71. doi: 10.1007/s13205-020-2057-1
- Shakya M, Lo C–C, Chain PSG. Advances and challenges in metatranscriptomic analysis. Front Genet. 2019;10:904. doi: 10.3389/fgene.2019.00904
- Barton HA, Taylor NM, Lubbers BR, Pemberton AC. DNA extraction from low-biomass carbonate rock: An improved method with reduced contamination and the low-biomass contaminant database. J Microbiol Methods. 2006;66(1):21–31. doi: 10.1016/j.mimet.2005.10.005
- Wally N, Schneider M, Thannesberger J, et al. Plasmid DNA contaminant in molecular reagents. Sci Rep. 2019;9:1652. doi: 10.1038/s41598-019-38733-1
- Weyrich LS, Farrer AG, Eisenhofer R, et al. Laboratory contamination over time during low-biomass sample analysis. Mol Ecol Resour. 2019;19(4):982–996. doi: 10.1111/1755-0998.13011
- Christensen GJM, Brüggemann H. Bacterial skin commensals and their role as host guardians. Benef Microbes. 2014;5(2):201–215. doi: 10.3920/BM2012.0062
Supplementary files
