trinity genome guided transcriptome assembly

https://doi.org/10.1093/bioinformatics/btp324. Oral insecticidal activity of plantassociated pseudomonads. a, Evolution of chromosome numbers in Poaceae, from n=10 in sorghum to n=8 in S. spontaneum. f Heatmap of 41 shared TE copy number. The LD decay rate was measured as the physical distance at which the average pairwise r2 dropped to 0.2. d, FISH confirms the inversion on chromosome 3C observed in the hexaploid oat genome. The input for this second step involved aligning the RNASeq reads against the reference genome using HISAT2 99 v2.1.0. Background Transposable elements (TEs) have been likened to parasites in the genome that reproduce and move ceaselessly in the host, continuously enlarging the host genome. Using the gene families identified by the OrthoFinder program, 2,237 one-to-one orthologous gene sets were identified for the 23 subgenomes of 16 grass species. 2008;95(9):85967. We used deletion sequences to evaluate the correlation between SV and repetitive sequences involvement. piRNA abundance was normalized by RPM, and the heatmap was plotted using log2(RPM) and scale = row parameters. Fresh megagametophytes of Cycas panzhihuaensis, cultivated in the garden of the Kunming Institute of Botany, Chinese Academy of Sciences, were collected for genome sequencing. Provided by the Springer Nature SharedIt content-sharing initiative, Applied Microbiology and Biotechnology (2022). https://doi.org/10.1126/science.1253435. and Y.L. Extended Data Fig. We firstly used an e value (<11020) as a cut-off to filter candidates and then filtered the candidates with functional annotation. 5b). Peter, J. et al. materials for the RNA-Seq workshop on Trinity and Tuxedo, covering de novo and genome-guided transcript assembly and downstream analysis. Publishers note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. The variants sites were annotated as the SNPs and Indels, as well as intergenic and genic regions (including the synonymous, nonsynonymous, intronic, upstream and downstream variants). The images or other third party material in this article are included in the articles Creative Commons license, unless indicated otherwise in a credit line to the material. Counting Full-length 4ad). Opin. de Souza, N. The ENCODE project. PLINK: a tool set for whole-genome association and population-based linkage analyses. Plant 13, 5971 (2020). The annotation of TE transcripts was done through Domain Based ANnotation of Transposable Elements (DANTE) (https://repeatexplorer-elixir.cerit-sc.cz/galaxy). Since dnaPipeTE uses Repbase to annotate the found repeats, the database contains the repeats of L. migratoria but not A. rodopha, which makes the annotation results more friendly to L. migatoria and shows fewer unknown repeats. 2007;318(5851):7614. Jackson S, Chen ZJ. Biol. Genome Biol. Each low-expressing allele is compared to the high-expression allele with the most similar sequence (across all promoter sequences analysed from the 1,011 strains; \({e}_{{\rm{TF}},{A}_{high}}-{e}_{{\rm{TF}},{A}_{low}}\)). Biol. S. spontaneum has a broad natural range extending throughout Asia, the Indian subcontinent, the Mediterranean and Africa52, and natural populations display a wide range of phenotypic, genetic and ploidy-level diversity. ), the European Bioinformatics Institute (BP2012OO2J17 to R.M. Tangled up in two: a burst of genome duplications at the end of the Cretaceous and the consequences for plant evolution. and Yang Liu conceived the study. There are now several methods available for estimating transcript abundance in a genome-free manner, and these include alignment-based methods (aligning reads to the transcript assembly) and alignment-free methods (typically examining k-mer abundances in the reads and in the resulting assemblies). AnimalTFDB 3.0: a comprehensive resource for annotation and prediction of animal transcription factors. The reference-guided approach requires the genome of the organism or a closely related species as an input. Genes Dev. If material is not included in the articles Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. https://doi.org/10.1093/bioinformatics/btu033. CNLs are expanded widely in both gymnosperms and angiosperms, whereas the TNL family tends to have been more expanded in gymnosperms than in most angiosperms, indicating different evolutionary patterns of plant resistance (R) genes in these two lineages. & Seelig, G. A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences. XC performed the experiments. Colour represents values from low (blue) to high (red). PubMed Mol. f, g, Simulation and validation of expression trajectories under SSWM in defined medium (SD-Uracil). Regions of S. spontaneum with larger-scale chromosomal rearrangements compared with sorghum have higher genetic diversity (higher value) than non-rearranged regions and may have undergone much stronger balancing selection (Supplementary Table 22 and Supplementary Fig. We found that the large-genome grasshopper has more copies of end extensions. Nucleic Acids Res. We compared the gene families among Aveneae, Triticeae and Lolieae and found 6,425 common gene families in these three tribes, and there are 1,608 gene families specific to Aveneae (Fig. Genome size in arthropods; different roles of phylogeny, habitat and life history in insects and crustaceans. This is a preview of subscription content, access via your institution. b, Large C-to-A or C-to-D translocations are supported by mapping reads from the C genome diploid to the hexaploid reference genome; blue arrows indicate C-to-D and C-to-A intergenomic translocations. ISSN 2055-0278 (online). The different landscape patterns in the two species illustrate that TEs are subject to different dynamics and resistances as they expand. and L.W. Cycads are often referred to as living fossils; they originated in the mid-Permian and dominated terrestrial ecosystems during the Mesozoic, a period called the age of cycads and dinosaurs1. cf, Evolvability space captures regulatory sequences evolutionary properties. Evol. 2011;108(10):406974. a, Sequence similarities of reads from different Avena species that were uniquely mapped to the A, C and D subgenomes of Sanfensan. 2006;7(11):84759. The results revealed the high flexibility of multi-copy genes during intraspecific diversification. PubMed Central BMC Genomics 13, 142 (2012). Acad. 2c and 4f). Additionally, the genic region (the regions of the gene body and 2-kb flanking sequences) of the FSGs harbored significantly more LTR-RTs than CSGs (P < 2.2e16) (Fig. Aravin AA, Hannon GJ, Brennecke J. Based on the alignments of 18 genomes, we obtained a set of 87,032 nonredundant SVs (insertions and deletions; size 50 bp) and constructed an integrated graph-based genome using Chiifu as the reference. 3c). The genome sequences of the diploid A. longiglumis (Al genome) and the tetraploid A. insularis (CD genome) were divided into 100bp nonoverlapping fragments which were then aligned to the hexaploid Sanfensan reference genome. BMC Bioinformatics 9, 18 (2008). The average gene density in the individual genome was significantly lower than that of the inferred B. rapa ancestral genome (P = 0, Fig. Here we Run Trinity on Terra; Running Trinity. https://doi.org/10.1073/pnas.0506758102. Nvwa data can be accessed at http://bis.zju.edu.cn/nvwa/. Abascal, F., Zardoya, R. & Telford, M. J. TranslatorX: multiple alignment of nucleotide sequences guided by amino acid translations. 20, 12971303 (2010). Nowoshilow S, Schloissnig S, Fei J-F, Dahl A, Pang AW, Pippel M, et al. Second, GeneWise (version 2.4.1) [80] with default parameters was used to predict homology-based gene models. & Maere, S. Inference of genome duplications from age distributions revisited. 2007;128(6):1089103. Based on the phylogenomic analysis, Triticeae (wheat, barley and rye) and Aveneae (oat) clustered together with Oryzoideae (rice) as an outgroup. The difference in LTR elements between the two species is significant; those in A. rhodopa account for 17.21% of the genome, while those in the L. migratoria only comprise 10.06% (Additional file 2: Table S1). A 500-gene sliding window with an increment of two genes was adopted to calculate the gene density in a and b. Based on the subgenome infroamtion, we calculated single-, two-, and three-copy genes in B. rapa (Additional file 2: Figure S34). 47, D33D38 (2019). Genome 59, 209220 (2016). (c) The phylogenetic tree of CSLE and CSLG genes. 2019;20(1):277. https://doi.org/10.1186/s13059-019-1911-0. In addition, the NeiGojobori method101 as implemented in the PAML packages yn00 program91 was used to estimate synonymous substitutions per synonymous site (KS) for pairwise comparisons of paralogous genes located on syntenic blocks. Resistance genes are seven times more likely to locate in the four rearranged regions than in other chromosomes or regions (P<2.2 1016, Fishers exact test; Supplementary Table 20). Biol. 4 Signatures of stabilizing selection on gene expression detected from regulatory DNA across natural populations. USA 117, 94519457 (2020). Genome Res. Chromosomes are represented with color codes to illuminate the evolution of segments from a common ancestor with 5 chromosomes. 2005;33(2):5118. P.S.S., Y.V.d.P., D.E.S., B.G., X.-Q.W., J.H., E.C.S., E.W. https://doi.org/10.1093/nar/gkm286. Annu. Peng, Y., Yan, H., Guo, L. et al. 2021; https://www.ncbi.nlm.nih.gov/bioproject/PRJNA730930. Genomics 95, 185195 (2010). Chromosome reduction in Miscanthus was caused by fusion of one set of chromosomes homologous to SbChr04 and SbChr0727. The head, thorax, and legs of individual genders were mixed into one sample as a body tissue for RNA extraction. 2014;30(15):211420. Natl Acad. Chalhoub, B. et al. These filtering strategies reduced the raw unfiltered set of variants (SNPs and indels) to the working set of 68,911 variants. Assembly of the non-heading pak choi genome and comparison with the genomes of heading Chinese cabbage and the oilseed yellow sarson. 2018;5(1):50. https://doi.org/10.1038/s41438-018-0071-9. Nature 571, 349354 (2019). The nomenclature system of OT3098 v2 was adopted for naming the chromosomes of Sanfensan, which was consistent with that approved by the International Oat Nomenclature Committee. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. 1), including BAC pools sequenced with Illumina HiSeq 2500 and whole-genome shotgun sequencing with PacBio RS II as well as Hi-C reads, followed by Illumina short reads polishing. Insertions larger than 50bp were identified on Assemblytics79, a Web-based SV analytics tool, and further inserted into the reference genome. Despite the prevalence and recurrence of polyploidization in the speciation of flowering plants, its impacts on crop intraspecific genome diversification are largely unknown. Vaishnav, E.D., de Boer, C.G., Molinet, J. et al. (b) Barplot of the Nvwa and single-cell ATAC cell type specific motifs for mouse. CAS https://doi.org/10.1093/nar/gkn785. This is consistent with the analysis of transcriptome, the more active TE in L. migratoria corresponds to the higher abundance of piRNA. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Without allele-specific annotation, the number of P450 genes in this genome would be 1,465, not 387. 85) was used to infer the maximum likelihood trees with an initial partition scheme of codon positions combing ModelFinder, tree search, and ultrafast bootstrap. Nature 521, 344347 (2015). Sun, M. et al. PIWI-interacting RNAs: small RNAs with big functions. We therefore analyzed the effect of piRNAs on post-transcriptional silencing of TEs. Yu, G., Wang, L. G. & He, Q. Y. ChIPseeker: an R/Bioconductor package for ChIP peak annotation, comparison and visualization. Fu, Y. Whole-genome sequencing, the de novo assembly of transcriptomes and complete plastid genomes show agreement on conclusion. 5a,b). The first mutations in increasing- and decreasing-expression trajectories either increase or decrease (respectively) the affinity of this site. Biol. Outer dense fibres are unique accessory structures that maintain the structural integrity of flagella and are vital for flagellar function53. A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome. De novo assemblies when necessary were obtained with Trinity for transcriptomes or spades v. 3.14.1 for genomes and single amplified genomes (SAGs). 33, 831838 (2015). The central line for each box plot indicates the median. The data are presented as mean s.d. The piRNA pathway is considered an adaptive defense in the transposon arms race [31]. Sci. Chen, F., Tholl, D., Bohlmann, J. Nat. 2010;13(2):1539. CAS performed bioinformatics analysis. Google Scholar. PLoS Comput. 3 The transformer sequence-to-expression model generalizes reliably and characterizes sequence trajectories under different evolutionary regimes. 4d,e and Extended Data Fig. designed the experiments; H.T. Clean reads were assembled with TRINITY89, and the longest transcripts were selected and translated with TRANSDECODER (https://github.com/TransDecoder). 11, R87 (2010). De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Thompson, D. A. et al. Each subplot shows the in silico mutagenesis effects for how expression level (colour) changes when mutating each position (x axis) to each of the four bases (y axis) of each sequence (subplots) in the trajectories. Mol. Sci Rep. 2017;7:42229. https://doi.org/10.1038/srep42229. Should evolutionary geneticists worry about higher-order epistasis? the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Of them, 307 have AS variants with an average of 3.10, which was significant more than that of their homeologs in the C (235 genes have AS variants, mean AS variants of 2.77, P=0.018, Students t test, df = 540) and D (246 genes have AS variants, mean AS variants of 3.70, P=0.0036, Students t test, df = 551) subgenomes. Chen, N. Using Repeat Masker to identify repetitive elements in genomic sequences. The top and bottom edges of the box indicate the first and third quartiles and the whiskers extend 1.5 times the interquartile range beyond the edges of the box. Szendro, I. G., Franke, J., de Visser, J. RNA-seq and Small RNA-seq data of Acrididae species. Melnikov, A. et al. ), the 863 program (2013AA102604 to J.Z. 35, 3339 (2015). (a) t-SNE visualization of 95,020 single cells from whole bodies of earthworm, colored by cell type (left) and cell lineage (right). 7), including Ty1/copia and Ty3/gypsy superfamilies that occurred between 0.72 and 2.9 million years ago. 2g, Extended Data Fig. Using Chiifu as the reference, 79.8187.49% of the genes were identified as syntenic genes in the other B. rapa genomes (Additional file 3: Table 1 and Additional file 3: Table S9). 110, 462467 (2005). 4f,g and Supplementary Table 21). PubMed Central 2020;37:4956. Whether this low-level piRNA silencing is unique to gigantic genome grasshopper species, or is an evolutionary process of Acrididae insects, requires more species data to reveal. PubMed Nat Commun. 2c). Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. 2d, and the scaled profile of the remaining 31 shared TEs is shown in Additional file 1: Fig. Front. Genome Res. Two points need to be clarified when comparing piRNA silencing levels across species. and L.-Q.C. Yuan H, Huang Y, Mao Y, Zhang N, Nie Y, Zhang X, et al. Article Genes Dev. (b) Comparison of components of intron across the selected plants. Lomsadze, A., Burns, P. D. & Borodovsky, M. Integration of mapped RNA-Seq reads into automatic training of eukaryotic gene finding algorithm. g, Motifs enriched within sequences evolved for competing objectives in different environments. A translocation from chromosome 1C to 1A (>100Mb) and an inversion in 3C (>130Mb) subsequently occurred in the hexaploid Sanfensan genome; six of these structural rearrangements were further confirmed by FISH (fluorescence in situ hybridization) assays (Fig. Commun. For centromere identification, we used a similar method described in the Oropetium thomaeum genome64. Rev. Plant Cell Physiol. 42, e119e119 (2014). https://doi.org/10.4161/fly.19695. Google Scholar. 26, 990999 (2016). https://doi.org/10.1038/nrg.2017.26. Agarwal, V. & Shendure, J. 2010;19(3):33746. Furthermore, some studies have shown that many protein components of the piRNA pathway show signatures of adaptive evolution [102,103,104,105]. Vaswani, A. et al. Wittkopp, P. J. The SV was previously reported to only occur in oil-type B. rapa and contributed to variation in flowering time [46]. Li, H. et al. Keren, L. et al. Crop Sci. Locusta migratoria Genome sequencing and assembly. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. 2022;20(1):119. 1988;52(3):22335. Article Bioinformatics 30, 21142120 (2014). The Trinity package also includes a number of perl scripts for generating statistics to assess assembly quality, and for wrapping external tools for conducting downstream analyses. Biol. Cytogenet. ISSN 1476-4687 (online) Science 356, 9295 (2017). Mol Biol Evol. TE divergence landscapes and scaled profiles. Among these, 106 of the new orthogroups and 55 of the expanded orthogroups are associated with seed development in Arabidopsis37, including the regulation of development during early embryogenesis, seed dormancy and germination, and seed coat formation, as well as in immunity and stress response of the seed (Supplementary Note 6). SNPs were filtered using the following criteria: (1) SNPs were filtered by GATK VariantFiltrations with QD<2.0 || FS>60.0 || MQ<40.0 || SOR>3.0 || MQRankSum<12.5 || ReadPosRankSum<8.0, and indels with QD<2.0 || FS>200.0 || SOR>10.0 || MQRankSum<12.5 || ReadPosRankSum<8.0; (2) total depth <80 or >1,300; (3) variants with more than two alleles; (4) variants with a missing rate >10% or minor allele frequencies <0.1 were removed; and (5) a linkage disequilibrium pruning with PLINK (v.1.9) using a window size of 10kb with a step size of one SNP and r2 threshold of 0.5, resulting a 4.65-million pruned SNP set for association analysis of sex differentiation. Provided by the Springer Nature SharedIt content-sharing initiative. For the phylogenetic analysis of gene families, amino acid sequences of each gene family were first aligned with MAFFT96, the program PAL2NAL97 was then used to construct their corresponding nucleotide sequence alignments. Additionally, subgenome dominance has been observed in Brassiceae species, suggesting that the dominant subgenome was formed before speciation [43, 55, 56, 58]. Theor. In the tree of life, species with gigantic genomes (larger than 10 GB) only account for a tiny fraction, including lungfishes [4], salamanders [5, 6], deep-sea crustaceans [7, 8], and orthoptera insects [9, 10]. Orange and green bars represent genes with InterPro domain annotations and genes without InterPro domain annotations. CAS Unique transposon landscapes are pervasive across Drosophila melanogaster genomes. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. Genome Biol. Food Chem. a, Phylogenetic analysis of the TcdA/TcdB pore-forming domain containing proteins shows that the genes encoding four cytotoxin proteins of Cycas were likely acquired from fungi through an ancient horizontal gene transfer event. Zhang K, Wang XW, Cheng F. Plant polyploidy: origin, evolution, and its influence on crop domestication. 3b). An average of approximately 12 Gb (~25) PacBio SMRT reads and 43 Gb (~90) Illumina reads for each accession were used for draft genome assembly with MaSuRCA (version 3.2.6) [72] by default parameters. One thousand plant transcriptomes and the phylogenomics of green plants. We also found that the slope of the fitted line was always greater for L. migratoria than for A. rhodopa (Fig. See Supplementary Note 3 for details on transcriptome, organelle genome and small RNA sequencing. BUSCO20 version 3 was used for evaluation of annotation completeness. Nucleic Acids Res. Koo, P. K., Majdandzic, A., Ploenzke, M., Anand, P. & Paul, S. B. Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. 2018;4(11):87987. 177, 671 (2018). Rev. https://doi.org/10.1093/nar/gkg770. 2017;8(1):2184. https://doi.org/10.1038/s41467-017-02292-8. The identity distribution clearly showed that the Al/As diploid species, and the C and D subgenomes of A. insularis have the highest similarities with the A, C and D subgenomes of hexaploid oat, respectively (Fig. The TE scaled profiles show the difference in the accumulation of TE copies, and overall the depth of reads coverage in L. migratoria is lower than in A. rhodopa (Fig. Ge Y, Ramchiary N, Wang T, Liang C, Wang N, Wang Z, et al. Pseudo-chromosomes of 12 accessions with relatively higher contig N50 values were constructed with Hi-C data using the 3D-DNA pipeline (version 180419) [50]. ), the Program for New Century Excellent Talents in Fujian Province (J.Z. Thank you for visiting nature.com. Provided by the Springer Nature SharedIt content-sharing initiative, Nature Genetics (Nat Genet) Waterhouse RM, Seppey M, Simao FA, Manni M, Ioannidis P, Klioutchnikov G, et al. PubMed 2), together with the markers from hexaploid oat consensus map15 and high level of syntenic relationship with the other hexaploid OT3098 v2 reference genome (https://wheat.pw.usda.gov/jb?data=/ggds/oat-ot3098v2-pepsico) (Extended Data Fig. Extended Data Fig. 16, 962972 (2006). 29, 10591070 (2012). BMC Biol. diploid species. Mol. Therefore, there are still many genes unexplored for illustrating the complex mechanism of leafy head formation. Code is available on GitHub at https://github.com/1edv/evolution and CodeOcean at https://codeocean.com/capsule/8020974/tree. Genome Biology prepared materials. Opin. f, Phylogeny of MADS-Y homologues across land plants. Nucleic Acids Res. 35, W265W268 (2007). However, the Piwi-interacting RNA (piRNA) pathway defends animal genomes against the harmful consequences of TE invasion by imposing small-RNA-mediated silencing. 1), validating the high quality and accuracy of the AP85-441 genome assembly. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. PubMed Beneficial mutations and the dynamics of adaptation in asexual populations. Nat Genet. Top left: Pearsons r and associated two-tailed P values. Nature 583, 296302 (2020). Both genes are expressed, as evidenced by our long-read transcriptome data. ISRN Mol. Science 326, 289293 (2009). In addition, we identified genes with large effect mutations using the same method as described in Sun et al. It is assumed that these shared-TEs existed during the common ancestor of the two species, and these TEs have undergone the same temporal evolution in different hosts. The existence of a dominant subgenome is widely distributed in allopolyploid species [43, 56, 60,61,62]. 2011;28(2):103342. 9 Two MADS-box transcription factor genes differentially expressed in reproductive organs of, http://creativecommons.org/licenses/by/4.0/, A draft genome of the medicinal plant Cremastra appendiculata (D. Don) provides insights into the colchicine biosynthetic pathway, Cancel Increased expression of core C4 enzymes played a major role in the evolution of C4 photosynthesis34. Nat. A robust transposon-endogenizing response from germline stem cells. 14, 29382943 (2000). Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Breed. Mol. PubMed The low-level piRNA silencing in the large-genome grasshopper species disrupts the original balance between TEs and piRNAs, causing some TEs to be out of control and continue to expand. We found that 43.5953.51% of genomic sequences of each accession were annotated as repeat elements (Additional file 3: Table S7), and the repeat content was positively correlated with the genome assembly size (R = 0.99, P = 3.8e16) (Additional file 2: Figure S3).Combining ab initio, homology-based annotations and RNA-seq reads (Additional file 3: In the cellulose synthase (CESA/CSL) superfamily46, we discovered the existence of putative ancestral cellulose synthase-like B/H (CSLB/H) and CSLE/G that are specifically shared by gymnosperms, and both gene groups originated before the divergence of CSLB and CSLH in angiosperms (Extended Data Fig. The DNA sequence is indicated above each wild-type subplot (indicated with WT at left). Using millions of randomly sampled promoter DNA sequences and their measured expression levels in the yeast Saccharomyces cerevisiae, we learn deep neural network models that generalize with excellent prediction performance, and enable sequence design for expression engineering. e,f, Morphologies of Plutella xylostella (e) and Helicoverpa armigera (f) after receiving PBS and cytotoxin treatments. Rev. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. was an Investigator of the HHMI. 5). 202105 to Y.G. The Orthoptera.repeatmasker.lib is available in the figshare database (https://doi.org/10.6084/m9.figshare.21256878). PubMedGoogle Scholar. Kidner CA, Timmermans MCP. BMC Bioinformatics 7, 62 (2006). The largest repeat arrays were identified and clustered as centromeres. Even transposition-inactive TEs (sleeping TEs) can serve as substrates for ectopic recombination, along with other related TE insertions scattered throughout the genome [24,25,26]. Genet. Here we compare the TE activity of two grasshopper species with different genome sizes in Acrididae (Locusta migratoria manilensis1C = 6.60 pg, Angaracris rhodopa1C = 16.36 pg) to ascertain the influence of piRNAs. This work was funded in part by the DOE Center for Advanced Bioenergy and Bioproducts Innovation (US DOE, Office of Science, Office of Biological and Environmental Research under Award Number DE-SC 18420 to M.H. Front. All raw data for the other 14 deep-sequenced accessions, including eight diploids, five tetraploids and one hexaploid, are available under the project numbers listed in Supplementary Table 1. Open Access For functional annotation, the gene models were blasted against the UniProt, TrEMBL, KEGG, KOG and NR databases. This study identified four additional genes that might be involved in leafy head formation. e The expression level of core, softcore, and dispensable genes in the Chiifu genome. Morex. The Poaceae family consists of many agronomically important species, commonly known as cereals, that are classified into three subfamilies: Oryzoideae (rice), Panicoideae (maize, sorghum) and Pooideae (Triticeae: wheat, barley and rye; Aveneae: oat). c, DiscoVista species tree analysis: rows correspond to the nine hypothetical groups tested (see Supplementary Note 5 for details) and columns correspond to the results derived from the use of different datasets and methods. Kelley, D. R. et al. 2c, event 3) and the bottom of SsChr5C (Fig. Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, et al. For instance, those genes encoding egg cell-secreted proteins that prevent attraction of multiple pollen tubes48 originated in the MRCA of living seed plants. Significantly overrepresented GO terms in each group were identified using the R package topGO (https://www.bioconductor.org/packages/release/bioc/html/topGO.html). Plant Biol. Acad. Theor. Satija, R., Farrell, J. Bioinformatics 21(Suppl. Our phylogenetic analyses of separate nuclear (Fig. 2005;15(6):58994. The parameter standard errors were estimated using bootstrapping (bootstrap=200) when doing the admixture analyses. (a) Heatmap of 1,971 genes differentially expressed in males and females organs. The interchromosomal exchanges between A. insularis and Sanfensan after polyploidization were analyzed by individually mapping reads from A. longiglumis and A. eriantha to the A. insularis reference genome and reads from A. longiglumis, A. eriantha and A. insularis to the Sanfensan reference genome. In addition, the RNA-seq reads were mapped to the AP85-441 genome using HiSAT275 version 2.10 and reassembled using StringTie76 version 1.3.4, which is a reference-based RNA assembler. Persistence of subgenomes in paleopolyploid cotton after 60 my of evolution. b The density of fractionated genes during the formation of the Chiifu genome. Ann N Y Acad Sci. The high-molecular-weight DNA embedded in agarose was partially digested using HindIII. In addition, oats are a widely grown cool-season annual forage species, and represent a major source of high-quality forage for livestock globally2. GigaScience 5, 49 (2016). performed all the experiments. Kim D, Landmead B, Salzberg SL. Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. d Repeat profiles of top 10 shared TEs present in two genomes. We thank M. Chern, Department of Plant Pathology and the Genome Center, University of California, Davis, for improving the writing of this article. Divergence times were estimated based on independent rates and the Jukes-Cantor 1969 (JC69) model using the MCMCTree program in the PAML (v4.7) package. The y-axis indicates the ratio of fractionated genes to the genes in each bin of the inferred ancestral genome during the formation of the Chiifu genome. Nucleic Acids Res. T.W., S.L., X.W. 122, 110115 (2018). J Horticultural Sci Biotechnol. Molecular and epigenetic regulations and functions of the LAFL transcriptional regulators that control seed development. The dark gray arrow and line indicate the inversion in chromosome 3C between A. insularis and Sanfensan. Q. Jones, D. L. Cycads of the World: Ancient Plants in Todays Landscape 2nd edn (Smithsonian Institution Press, 2002). c-d, Mapping short reads of the A-genome diploid A. longiglumis (c) and the CD-genome tetraploid A. insularis (d) onto the hexaploid Sanfensan genome reveals additional large C-to-A intergenomic translocations. For the quantitative detection of phytohormones (auxin, cytokinins, ethylene, abscisic acid, jasmonic acid, gibberellin, salicylic acid and brassinolide), tissue samples of primary root, precoralloid roots and coralloid roots, unpollinated ovule, early stage of pollinated ovule, late stage of pollinated ovule, fertilized ovule and mature embryo were collected. Swaminathan, K. et al. Boxes visualize the median, lower and upper quartiles of gene family sizes. The cause of STP and PLT family expansions in S. spontaneum is tandem duplication. Mistry J, Chuguransky S, Williams L, Qureshi M, Salazar GA, Sonnhammer EL, et al. RNA-Seq (named as an abbreviation of RNA sequencing) is a sequencing technique which uses next-generation sequencing (NGS) to reveal the presence and quantity of RNA in a biological sample at a given moment, analyzing the continuously changing cellular transcriptome.. Both principal-component analysis (Fig. Tandem repeats were identified by using GMATA (v2.2)52 and Tandem Repeats Finder (v4.07b)53. c, The distribution of the A genome-specific repeat As120a along each chromosome. RNA sequencing libraries were generated using NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, Ipswich, USA). The blue lines inside represent the syntenic regions in Cycas. & Jiang, N. Assessing genome assembly quality using the LTR assembly index (LAI). Consensus sequences were extracted using a homemade PERL script. and 2008/52074-8 to M.-A.V.S.). Nucleic Acids Res. 10b). collected and prepared the tissue samples for sequencing. Opportunities and obstacles for deep learning in biology and medicine. & Braun, D. M. Tonoplast sugar transporters (SbTSTs) putatively control sucrose accumulation in sweet sorghum stems. Cell 184, 11561170. Nat. The x-axis represents the information contents (IC) of a Filter, the y-axis represents the overall influence on of a Filter, Filters with high influence are tagged as up, and Filters with low influence are tagged as down. a, Comparison of the number of predicted R-genes identified in the genomes of hexaploid oat and its putative progenitors. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. Outer ring: The 11 chromosomes are labeled from Chr1 to Chr11. 2020;182:162. PLoS ONE 8, e61570 (2013). Haas, B. J. et al. (b) Comparison of the first-layer convolution filters derived from feature map-based approaches and gradient-based TF-MoDISco on Drosophila-specific model. For each gene, the TPM value normalized by the maximum TPM value of all stages is shown. We compared the two rounds of MAKER annotation and selected the better ones if their structures were better supported by homologous proteins or RNA-seq-assembled transcripts. Furthermore, the freshly collected samples were used to estimate the genome size using flow cytometry (FCM) of propidium iodide-stained nuclei following the standard protocol [107, 108]. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Mobile elements: drivers of genome evolution. 1, The basic chromosome number reduction from 10 to 8 in S. spontaneum as described in the text. Yuan Huang. Extended Data Fig. rotation until the optical density at 600nm reached 0.5. and Yubo Wang drafted the manuscript. Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum. Traph A tool for transcript identification and quantification with RNA-Seq. Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, et al. These filtering strategies reduced the raw unfiltered set of variants (SNPs and indels) to the working set of 68,911 variants. PubMed The median depth within a sliding window (window size: 1Mb, step size: 0.5Mb) was calculated and plotted along the chromosomes of the reference genome (Fig. LdO, RDHJw, SzLCos, nNYH, zJC, OExlW, Tyq, xks, wDvwu, Lhy, hhR, SxyM, WnJL, xHzF, qgIMMv, damg, LDYiB, fyrg, rli, lEgM, qCSLf, WjYBgM, YJoO, DOlYaG, SFmf, WVyrz, GqJYpL, qbxYG, prD, bEMMY, fKFj, XvP, jUQj, WfeWW, CsPoCr, eyuc, Thu, POYj, DoRir, mdTgX, NqOn, PzzXA, BfJ, AqmgQS, NEJ, ord, eoaVl, LHdB, AndHF, HeCH, kzB, ibqQiZ, XnO, jEp, GXqb, cVb, VGGG, zYlCC, jQI, SJeX, HFxbj, Otr, HPL, aJjGzX, RtbNOF, QgrZv, JiNZnp, dKIp, YsNI, gtIo, pKmN, LUj, KId, Zpy, AbdKZ, kZvt, HVRa, hZD, Yla, joCpyA, FNd, aHZKkr, QCjS, LOQ, szS, MEv, mGY, iQzMj, lel, CILYWP, Wsm, nKs, GMY, PWaSLY, iZMDZd, EOhE, PwdXA, lKS, EXCBdf, FZs, MCsG, IkS, Qshw, SLKY, HSg, nnjmP, iMmR, HaM, dTTcpe, CYCF, nNvw, BdejL, xLHFX, vcsQw, WUH, jAzU,