Home
Introduction
Welcome to the manual of the SMAP-package.
SMAP is a software package that analyzes read mapping distributions and performs haplotype calling to create multi-allelic molecular markers. SMAP haplotyping works on:
all types of samples, including (diploid and polyploid) individuals and Pool-Seq.
reads of various library types, including Genotyping-by-Sequencing (GBS), highly multiplex amplicon sequencing (HiPlex), and Shotgun sequencing (including Whole Genome Sequencing (WGS), targetted resequencing like Probe Capture, CRISPR/Cas-induced or natural variation libraries, and RNA-Seq).
all NGS sequencing technologies like Illumina short reads and PacBio or Oxford Nanopore long reads.
SMAP delineate analyses read mapping distributions for GBS read mapping QC, defines read mapping polymorphisms within loci and across samples, and selects high quality loci across the sample set for downstream analyses. SMAP sliding-frames defines loci covering SNPs and/or structural variants to run SMAP haplotype-sites. SMAP compare identifies the overlap between two sets of loci (e.g. common loci across two runs of SMAP delineate). SMAP haplotype-sites performs read-backed haplotyping using a priori known polymorphic SNP sites, and creates `ShortHaps´. As a special case, SMAP haplotype-sites also captures GBS read mapping polymorphisms (here called Stack Mapping Anchor Points or `SMAPs´) as a novel genetic diversity marker type, and integrates those with SNPs for ShortHap haplotyping. SMAP snp-seq creates HiPlex primer designs based on known SNPs for targeted resequecing of polyporphic loci (currently under development). SMAP target-selection creates input files for SMAP design. SMAP design creates highly multiplex amplicon sequencing (HiPlex) primers and/or gRNA panels for genotyping CRISPR/Cas-induced or natural variation in a genepool. SMAP haplotype-window works independent of prior knowledge of polymorphisms, groups reads by locus, defines a window enclosed between two custom border sequences, and retains the entire corresponding DNA sequence as haplotype. SMAP effect-prediction is designed to provide biological interpretation of the haplotype call tables created by SMAP haplotype-window. SMAP grm creates a similarity/distance matrix by converting a SMAP haplotype-site genotype call table based on GBS or amplicon sequencing data.
Global overview
The scheme below displays a global overview of the functionalities of the SMAP package. White ovals are external operations and grey ovals are components of SMAP. Preprocessing of GBS reads should be performed by GBprocesS. Square boxes show the output of each of the components. Arrows show how output from various components are required input for the next component in the workflow for each of the NGS library types (GBS (red), HiPlex (purple), Shotgun (yellow)), file formats are shown in uppercase italics.
Detailed information of components
Check out detailed information on each of the nine components:
SMAP delineate analyses reference-aligned GBS reads by building a catalogue of loci within BAM files, whereby the start and end of `Stacks´ of reads define Stack Mapping Anchor Points (SMAPs). SMAP delineate then merges Stacks within a BAM file to create StackClusters. These StackClusters are then merged across multiple BAM files to build a catalogue of MergedClusters. Thus, SMAP delineate creates an overview of read mapping positions of GBS loci across sample sets and provides for quality control of read preprocessing and mapping procedures, before SNP calling and haplotyping. Please use instructions and software for GBS read preprocessing as described in the manual of GBprocesS.
SMAP sliding-frames defines loci as sliding frames that group adjacent SNPs within a given distance for read-backed haplotyping in HiPlex and Shotgun read data.
SMAP compare identifies the number of common loci across two runs of SMAP delineate and/or SMAP SlidingFrames. It is a useful tool to determine the number of common loci targeted by different NGS methods, in different populations, sample sets, or bioinformatics filtering procedures, etc. This, in turn, helps to optimize NGS library preparation parameters and bioinformatics parameters throughout the entire workflow.
SMAP haplotype-sites generates haplotype calls (ShortHaps) using sets of polymorphic `sites´ for read-backed haplotyping on reference-aligned sequencing reads. Polymorphic `sites´ include Stack Mapping Anchor Points (SMAPs, defined in a BED file created with SMAP delineate, or SMAP SlidingFrames) and SNPs (as VCF obtained from third-party algorithms) for the same set of BAM files. It creates an integrated table (sample x genotype call matrix) with discrete haplotype calls (for diploid or polyploid individuals) or relative haplotype frequencies (for Pool-Seq) for any number of samples and loci.
SMAP snp-seq creates HiPlex primer designs based on known SNPs for targeted resequecing of polyporphic loci (currently under development).
SMAP target-selection prepares reference sequences for SMAP design using predefined lists of geneID’s.
SMAP design takes one or more reference sequences (FASTA and GFF) as input and designs non-overlapping amplicons per reference taking target specificity into account. It can be combined with gRNA sequences for mutation induction of the reference sequences. SMAP design creates a primer file, gRNA file, GFF file with all structural features, and optionally a summary file and plot, and input files required for downstream analysis using SMAP haplotype-window.
SMAP haplotype-window works independent of prior knowledge of polymorphisms, groups reads by locus, defines a window enclosed between two custom border sequences, and retains the entire corresponding DNA sequence as haplotype. Haplotype-window is, among many applications, especially useful for high-throughput CRISPR/Cas mutation screens.
SMAP effect-prediction provides biological interpretation by taking a FASTA reference sequence, a GFF file with border positions in the reference sequence to delineate amplicon positions and the relative haplotype frequencies table created by SMAP haplotype-window.
SMAP grm converts a SMAP haplotype-sites or SMAP haplotype-window genotype call table into pairwise genetic relationship matrixes (grm). Genetic similarity is expressed in commonly used similarity coefficients and calculated based on the number of shared and unique haplotypes in a pair of samples. The output matrixes are created in customised, high-quality figures or in standard output file formats for downstream data analyses.
Recommended Reading
These published studies have used the SMAP package for various applications:
SMAP package
Schaumont D, Veeckman E, Van der Jeugt F, Haegeman A, van Glabeke S, Bawin Y, Lukasiewicz J, Blugeon S, Barre P, Leyva-Pérez MO, Byrne S, Dawyndt P, Ruttink T. (2022). SMAP: a versatile suite of tools for read-backed haplotyping. BioRxiv, DOI: 10.1101/2022.03.10.483555.
Develtere W, Waegneer E, Debray K, Van Glabeke S, Maere S, Ruttink T, Jacobs TB (2023). SMAP design: A multiplex PCR amplicon and gRNA design tool to screen for natural and CRISPR-induced genetic variation. Nucleic Acid Research, DOI: 10.1093/nar/gkad036
Natural genetic diversity
Depecker J, Verleysen L, Asimonyio JA, Hatangi Y, Kambale J-L, Mwanga Mwanga I, Ebele T, Dhed’a B, Bawin Y, Staelens A, Stoffelen P, Ruttink T, Vandelook F, Honnay O. (2023). Genetic diversity, genetic structure and pedigree relations in wild Robusta coffee (Coffea canephora) populations in the Yangambi area of the DR Congo and their relation with anthropogenic disturbance. Heridity DOI: 10.1038/s41437-022-00588-0
Verleysen L, Bollen R, Kambale J-L, Ebele T, Katshela BN, Depecker J, Poncet V, Assumani D-M, Vandelook F, Stoffelen P, Honnay O, Ruttink T. (2023). Characterization of the genetic composition and establishment of a core collection for the INERA Robusta coffee (Coffea canephora) field genebank from the Democratic Republic of the Congo. Frontiers in Sustainable Food Systems, DOI: 10.3389/fsufs.2023.1239442
Molecular markers, quantitative genetics, breeding
de la O Leyva-Pérez M, Vexler L, Byrne S, Clot CR, Meade F, Griffin D, Ruttink T, Kang J and Milbourne D. (2022). PotatoMASH - a low cost, genome-scanning marker system for use in potato genomics and genetics applications. Agronomy, DOI: 10.3390/agronomy12102461
Ergon Å, Milvang ØW, Skøt L, Ruttink T. (2022). Identification of loci controlling timing of stem elongation in red clover using genotyping by sequencing of pooled phenotypic extremes. Molecular Genetics and Genomics, DOI: 10.1007/s00438-022-01942-x
Pégard M, Barre P, Delaunay S, Surault F, Karagić D, Milić D, Zorić M, Ruttink T, and Julier B. (2023). Genome-wide genotyping data renew knowledge on genetic diversity of a worldwide alfalfa collection and give insights on genetic control of phenology traits. Frontiers in Plant Science, DOI: 10.3389/fpls.2023.1196134
Zanotto S, Ruttink T, Pegard M, Skøt L, Grieder C, Kӧlliker R and Ergon Å. (2023). A genome-wide association study of freezing tolerance in red clover (Trifolium pratense L.). Frontiers in Plant Science, DOI: 10.3389/fpls.2023.1189662
Frey LA, Vleugels T, Ruttink T, Schubiger FX, Pégard M, Skøt L, Grieder C, Studer B, Roldán-Ruiz I, and Kölliker R. (2022). Elucidating the genetic control of Southern Anthracnose and Clover Rot resistance in red clover. Theoretical Applied Genetics, DOI: 10.1007/s00122-022-04223-8
CRISPR-Cas gene editing
De Bruyn C, Ruttink T, Eeckhaut T, Jacobs T, De Keyser E, Goossens A, and Van Laere K. (2020). Establishment of CRISPR/Cas9 genome editing in Cichorium intybus var. foliosum or witloof. Frontiers in Genome Editing DOI:10.3389/fgeed.2020.604876
Van Huffel K. Stock M, Ruttink T and De Baets B. (2022). Covering the Combinatorial Design Space of Multiplex CRISPR/Cas Experiments in Plants. Frontiers in Plant Science, DOI: 10.3389/fpls.2022.907095
Impens L, Lorenzo CD, Vandeputte W, Wytynck P, Debray K, Haeghebaert J, Herwegh D, Jacobs TB, Ruttink T, Nelissen H, Inzé D, and Pauwels L. (2023). Combining multiplex gene editing and doubled haploid technology in maize. New Phytologist, DOI: 10.1111/nph.19021
De Bruyn C, Ruttink T, Lacchini E, Rombauts S, Haegeman A, De Keyser E, Van Poucke C, Jacobs TB, Desmet S, Eeckhaut T, Goossens A and Van Laere K. (2023). Identification and Characterization of CYP71 Subclade Cytochrome P450 Enzymes Involved in the Biosynthesis of Bitterness Compounds in Cichorium intybus. Frontiers in Plant Science, DOI: 10.3389/fpls.2023.1200253
Lorenzo CD, Debray K, Aesaert S, Coussens G, Demuynck K, Develtere W, Herwegh D, Impens L, Jacobs TB, Nelissen H, Pauwels L, Ruttink T, Schaumont D, Vandeputte W, Van Hautegem T, Inzé D. (2023). BREEDIT: A novel breeding strategy using multiplex genome editing in maize. The Plant Cell, DOI: 10.1093/plcell/koac243