Abstract
The application of advanced genomics to improve breeding techniques in grass crops will play a key role in securing affordable and nutritious food for an increasing human population. In particular bread wheat (Triticum aestivum) is the most widely traded agricultural commodity. Bread wheat, however, has one of the largest and most complex genomes yet to be sequenced. As a result of several recent hybridization events the bread wheat genome is allohexaploid (6n=42, AA,BB,DD) where each subgenome accounts for ~5-6 Gb of a total size estimated at ~15-16 Gb. Although genome size varies in grasses due to the expansion of retroelements, gene order is conserved along large chromosomal segments enabling comparative methods between related species. Several research groups across the world working are focusing their efforts in generating sequences for wheat and related species. In this presentation I will report on progress and achievements in two of these projects focused on obtaining high-quality sequences for the reference variety Chinese Spring. These sequences facilitate the anchoring of the physical contigs and provide useful sequencing data to the breeding community. These sequences have been generated using next-generation sequencing technologies and will allow the placement of most of the wheat genes to chromosomes.
The sequences generated are assembled using the latest software tools. One of the main challenges when working with the bread wheat genome is the repeat content and the size of the target (~500Mb for the largest chromosome arms). Recent advances in sequence assembly algorithms opened up new opportunities but the ability to generate de novo assemblies from short reads for large eukaryote genomes remains a challenge. Most of the current assembly tools struggle to deal with the massive datasets generated by the next-generation sequencing technologies. ıSome of the recent assembly algorithms have been designed to offer efficient alternatives to represent these datasets in main memory, but in general the results are assemblies with large numbers of contigs. In the second part of the talk I will review the state-ofthe- art of the assembly algorithms.
The Bioinformatics division at TGAC specialises in the analysis of high-throughput sequencing data including de novo assemblies, re-sequencing projects, expression analysis (RNA-seq) and metagenomics.
More information is available here
Refreshments will be served following the seminar