A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering

dc.contributor.author
Johnson, Matthew G.
dc.contributor.author
Pokorny, Lisa
dc.contributor.author
Dodsworth, Steven
dc.contributor.author
Rodríguez Botigué, Laura
dc.contributor.author
Cowan, Robyn S.
dc.contributor.author
Devault, Alison
dc.contributor.author
Eiserhardt, Wolf L.
dc.contributor.author
Epitawalage, Niroshini
dc.contributor.author
Forest, Félix
dc.contributor.author
Kim, Jan T.
dc.contributor.author
Leebens-Mack, James H.
dc.contributor.author
Leitch, Ilia J.
dc.contributor.author
Maurin, Olivier
dc.contributor.author
Soltis, Douglas E.
dc.contributor.author
Soltis, Pamela
dc.contributor.author
Wong, Gane Ka-Shu
dc.contributor.author
Baker, William J..
dc.contributor.author
Wickett, Norman J.
dc.date.issued
2018
dc.identifier
https://ddd.uab.cat/record/210790
dc.identifier
urn:10.1093/sysbio/syy086
dc.identifier
urn:oai:ddd.uab.cat:210790
dc.identifier
urn:pmid:30535394
dc.identifier
urn:pmcid:PMC6568016
dc.identifier
urn:pmc-uid:6568016
dc.identifier
urn:articleid:1076836Xv68p594
dc.identifier
urn:scopus_id:85062685400
dc.identifier
urn:altmetric_id:52477558
dc.identifier
urn:wos_id:000493314500003
dc.identifier
urn:oai:pubmedcentral.nih.gov:6568016
dc.description.abstract
Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds of nuclear loci for phylogeny reconstruction. Much of the cost of developing targeted sequencing approaches is associated with the generation of preliminary data needed for the identification of orthologous loci for probe design. In plants, identifying orthologous loci has proven difficult due to a large number of whole-genome duplication events, especially in the angiosperms (flowering plants). We used multiple sequence alignments from over 600 angiosperms for 353 putatively single-copy protein-coding genes identified by the One Thousand Plant Transcriptomes Initiative to design a set of targeted sequencing probes for phylogenetic studies of any angiosperm group. To maximize the phylogenetic potential of the probes, while minimizing the cost of production, we introduce a k-medoids clustering approach to identify the minimum number of sequences necessary to represent each coding sequence in the final probe set. Using this method, 5-15 representative sequences were selected per orthologous locus, representing the sequence diversity of angiosperms more efficiently than if probes were designed using available sequenced genomes alone. To test our approximately 80,000 probes, we hybridized libraries from 42 species spanning all higher-order groups of angiosperms, with a focus on taxa not present in the sequence alignments used to design the probes. Out of a possible 353 coding sequences, we recovered an average of 283 per species and at least 100 in all species. Differences among taxa in sequence recovery could not be explained by relatedness to the representative taxa selected for probe design, suggesting that there is no phylogenetic bias in the probe set. Our probe set, which targeted 260 kbp of coding sequence, achieved a median recovery of 137 kbp per taxon in coding regions, a maximum recovery of 250 kbp, and an additional median of 212 kbp per taxon in flanking non-coding regions across all species. These results suggest that the Angiosperms353 probe set described here is effective for any group of flowering plants and would be useful for phylogenetic studies from the species level to higher-order groups, including the entire angiosperm clade itself.
dc.format
application/pdf
dc.language
eng
dc.publisher
dc.relation
Systematic biology ; Vol. 68 (december 2018), p. 594-606
dc.rights
open access
dc.rights
Aquest document està subjecte a una llicència d'ús Creative Commons. Es permet la reproducció total o parcial, la distribució, la comunicació pública de l'obra i la creació d'obres derivades, fins i tot amb finalitats comercials, sempre i quan es reconegui l'autoria de l'obra original.
dc.rights
https://creativecommons.org/licenses/by/4.0/
dc.subject
Angiosperms
dc.subject
Hyb-Seq
dc.subject
K-means clustering
dc.subject
K-medoids clustering
dc.subject
Machine learning
dc.subject
Nuclear genes
dc.subject
Phylogenomics
dc.subject
Sequence capture
dc.subject
Target enrichment
dc.title
A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-Medoids Clustering
dc.type
Article


Fitxers en aquest element

FitxersGrandàriaFormatVisualització

No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)