Home

_images/SMAP_logo_v3.png

Introduction

Welcome to the manual of the SMAP package.

SMAP is a software package that can design targeted sequencing panels, analyze read mapping distributions, and perform haplotype calling to create multi-allelic molecular markers. SMAP haplotyping works on:

  • all types of samples, including (diploid and polyploid) individuals and Pool-Seq.

  • reads of various library types, including Genotyping-by-Sequencing (GBS), highly multiplex amplicon sequencing (HiPlex), and Shotgun sequencing (including Whole Genome Sequencing (WGS), targetted resequencing like Probe Capture, CRISPR/Cas-induced or natural variation libraries, and RNA-Seq).

  • all NGS sequencing technologies like Illumina short reads and PacBio or Oxford Nanopore long reads.

SMAP delineate analyses read mapping distributions for GBS read mapping QC, defines read mapping polymorphisms within loci and across samples, and selects high quality loci across the sample set for downstream analyses. SMAP sliding-frames defines loci covering SNPs and/or structural variants to run SMAP haplotype-sites. SMAP compare identifies the overlap between two sets of loci (e.g. common loci across two runs of SMAP delineate). SMAP haplotype-sites performs read-backed haplotyping using a priori known polymorphic SNP sites, and creates `ShortHaps´. As a special case, SMAP haplotype-sites also captures GBS read mapping polymorphisms (here called Stack Mapping Anchor Points or `SMAPs´) as a novel genetic diversity marker type, and integrates those with SNPs for ShortHap haplotyping. SMAP snp-seq creates highly multiplex amplicon sequencing (HiPlex) primer designs based on known SNPs for targeted resequencing of polymorphic loci. SMAP target-selection creates input files for SMAP design. SMAP design creates HiPlex primers and/or gRNA panels for genotyping CRISPR/Cas-induced or natural variation in a genepool. SMAP haplotype-window works independent of prior knowledge of polymorphisms, groups mapped reads by locus, defines a window enclosed between two custom border sequences, trims off the border sequences, and thus retains the entire DNA sequence corresponding to the window as haplotype. SMAP haplotype-window is very usefull to detect combinations of SNPs and short or long insertions or deletions. SMAP effect-prediction translates alternative haplotype sequences into their corresponding protein sequences for all haplotypes listed in the genotype call table created by SMAP haplotype-window, and thus identifies which alleles strongly affect protein-coding capacity. SMAP relatedness creates a pairwise similarity/distance matrix or performs UMAP clustering by converting a SMAP haplotype-sites or SMAP haplotype-window genotype call table. SMAP relatedness is useful to reconstruct relationships between samples based on shared or unique haplotypes. SMAP chromplot creates a Circos plot with loci colored according to similarity with haplotypes in reference samples, based on a SMAP haplotype-sites or SMAP haplotype-window genotype call table.


Global overview

The scheme below displays a global overview of the functionalities of the SMAP package. White ovals are external operations and grey ovals are modules of SMAP. Preprocessing of GBS reads should be performed by GBprocesS. Square boxes show the output of each of the modules. Arrows show how output from various modules are required input for the next module in the workflow for each of the NGS library types (GBS (red), HiPlex (purple), Shotgun (yellow)), file formats are shown in uppercase italics. A complete workflow for multiplex CRISPR/Cas gRNA design and mutant screening via HiPlex amplicon sequencing is available (green).

_images/SMAP_global_scheme_home_20261.png

Detailed information of modules

Check out detailed information on each of the eleven modules:

  • SMAP delineate analyses reference-aligned GBS reads by building a catalogue of loci within BAM files, whereby the start and end of `Stacks´ of reads define Stack Mapping Anchor Points (SMAPs). SMAP delineate then merges Stacks within a BAM file to create StackClusters. These StackClusters are then merged across multiple BAM files to build a catalogue of MergedClusters. Thus, SMAP delineate creates an overview of read mapping positions of GBS loci across sample sets and provides for quality control of read preprocessing and mapping procedures, before SNP calling and haplotyping. Please use instructions and software for GBS read preprocessing as described in the manual of GBprocesS.

  • SMAP sliding-frames defines loci as sliding frames that group adjacent SNPs within a given distance for read-backed haplotyping in HiPlex and Shotgun read data.

  • SMAP compare identifies the number of common loci across two runs of SMAP delineate and/or SMAP sliding-frames. It is useful to determine the number of common loci targeted by different NGS methods, in different populations, sample sets, or bioinformatics filtering procedures, etc. This, in turn, helps to optimize NGS library preparation parameters and bioinformatics parameters throughout the entire workflow.

  • SMAP haplotype-sites generates haplotype calls (ShortHaps) using sets of polymorphic `sites´ for read-backed haplotyping on reference-aligned sequencing reads. Polymorphic `sites´ include Stack Mapping Anchor Points (SMAPs, defined in a BED file created with SMAP delineate, or SMAP sliding-frames) and SNPs (as VCF obtained from third-party algorithms) for the same set of BAM files. It creates an integrated table (sample x genotype call matrix) with discrete haplotype calls (for diploid or polyploid individuals) or relative haplotype frequencies (for Pool-Seq) for any number of samples and loci.

  • SMAP snp-seq creates HiPlex primer designs based on known SNPs for targeted resequencing of polymorphic loci.

  • SMAP target-selection prepares reference sequences for SMAP design using predefined lists of geneID’s.

  • SMAP design takes one or more reference sequences (FASTA and GFF) as input and designs non-overlapping amplicons per reference taking target specificity into account. It can be combined with gRNA sequences for mutation induction of the reference sequences. SMAP design creates a primer file, gRNA file, GFF file with all structural features, and optionally a summary file and plot, and input files required for downstream analysis using SMAP haplotype-window.

  • SMAP haplotype-window works independent of prior knowledge of polymorphisms, groups reads by locus, defines a window enclosed between two custom border sequences, and retains the entire corresponding DNA sequence as haplotype. Haplotype-window is, among many applications, especially useful for high-throughput CRISPR/Cas mutation screens.

  • SMAP effect-prediction provides biological interpretation by taking a FASTA reference sequence, a GFF file with border positions in the reference sequence to delineate amplicon positions and the relative haplotype frequencies table created by SMAP haplotype-window.

  • SMAP relatedness converts a SMAP haplotype-sites or SMAP haplotype-window genotype call table into a pairwise genetic relationship matrix (pairwise) or performs clustering based on Uniform Manifold Approximation and Projection (UMAP). Genetic similarity is expressed in commonly used similarity coefficients and calculated based on the number of shared and unique haplotypes in a pair of samples. The output matrixes are created in customised, high-quality figures or in standard output file formats for downstream data analyses.

  • SMAP chromplot creates a Circos plot based on a SMAP haplotype-sites or SMAP haplotype-window genotype call table.