Scope & Usage

../_images/SMAP_logo_v3.png

Scope

Comparisons across data sets, shared and unique loci

SMAP compare analyzes the overlap (shared and unique loci) between two GBS data sets that have both been processed with SMAP delineate. SMAP compare can be used to compare:

  1. parameter settings during read preprocessing.

  2. parameter settings during read mapping (e.g. BWA-MEM).

  3. parameter settings during locus delineation (SMAP delineate).

  4. sets of progeny derived from independent breeding lines to estimate transferability of marker sets across a breeding program.

  5. a set of pools against their constituent individuals to estimate sensitivity of detection across the allele frequency spectrum (example shown below).

  6. GBS experiments performed in different labs, to investigate if similar protocols lead to similar sets of loci, i.e. comparability of own data to external data.


Integration in the SMAP workflow

../_images/SMAP_global_scheme_home_compare.png

SMAP compare is run on BED files with locus positions, directly after SMAP delineate, SMAP sliding-frames or SMAP design, and before the BED files are used for SMAP haplotype-sites. SMAP compare works on GBS, HiPlex and Shotgun sequencing data.

Required input

.. image:: ../images/sites/coordinates_GBS_manual.png
For GBS data, the user needs to run SMAP delineate on the same set of BAM files as will be used for haplotyping to create a BED file listing the loci with SMAPs. The read mapping profiles determine the locus start and end points and internal SMAPs.
=============== ===== ===== ================================= =================== ======= ======================= ============== ======== =============
Reference Start End MergedCluster_name Mean_read_depth Strand SMAPs Completeness nr_SMAPs Name
=============== ===== ===== ================================= =================== ======= ======================= ============== ======== =============
scaffold_10030 15617 15711 scaffold_10030:15618-15711_+ 1899 + 15618,15622,15703,15711 13 4 2n_ind_GBS_SE
scaffold_10030 15712 15798 “scaffold_10030:15713-15798_-” 1930 - 15713,15793,15798 9 3 2n_ind_GBS_SE
=============== ===== ===== ================================= =================== ======= ======================= ============== ======== =============

BED file entry listing all relevant features of two neighboring loci. On the + strand of the reference sequence, the start (15617) and end (15711) positions of the locus, together with the mean locus read depth (1899), the strand (+), the internal SMAP positions (15621, 15702), the number of samples with data at that locus (completeness, 13), the number of SMAPs (4), and a custom label that denotes the dataset (2n_ind_GBS_SE). The second entry lists the locus and SMAP positions on the (-) strand.

Commands & options


Example commands

smap compare /path/to/BED1 /path/to/BED2

Output

SMAP compare provides a graphical output that describes the common loci across two sample sets.