How It Works

Several applications of molecular markers iterate between marker discovery (e.g. WGS or GBS) and targeted screening by HiPlex amplicon sequencing (e.g. SNP-seq).
The SMAP snp-seq module fills the gap between those strategies: it takes SNP variants identified in a large screen, and allows to automatically design primers flanking selected SNPs or within selected regions, in both cases avoiding all known SNPs at primer binding sites.
Several input files are optionally provided to define the SNPs and/or regions to be targeted, and the SNPs to avoid during primer design.
SMAP snp-seq also generates the coordinate files for downstream analysis of HiPlex read data: a BED file with SMAPs for downstream analysis with SMAP haplotype-sites, or a GFF file with border positions for SMAP haplotype-window.
In addition, several parameters can be set to define distances between SNPs and/or loci.
In principle, it is possible to a priori define regions to be targeted (such as 1 kb regions at 1 Mb intervals) to design a HiPlex set that covers the entire genome at a fixed marker distance (for an example in potato, see de la O Leyva-Pérez et al. (2022)).
In addition, SMAP snp-seq can also be used to transfer GBS marker sets to HiPlex marker sets, by providing the ‘CentralRegions’ BED file generated by SMAP delineate, and a VCF file generated with e.g. GATK, while specifying minimum and maximum amplicon size for the designed HiPlex fragments (see scheme below).
../_images/overview_GBS_to_HiPlex.png

overview of input, workflow, options, and output

This overview show the relationship between the different features and how input files and parameter settings are used to customize the workflow.

../_images/feature_overview.png

Defining regions according to different scenario’s

Schematic overview of design steps

../_images/SMAP_snp-seq_overview_features.png

The different options applied by SMAP snp-seq depend on the type of data with which SNPs were obtained in previous steps. These are illustrated by a simplified drawing of whole genome shotgun (WGS) sequencing data and/or genotyping-by-sequencing (GBS) data (Figure 1), but SNPs from other sequencing library types (e.g. RNA-seq data, probe capture data) can be used as input as well.

../_images/SMAP_snp-seq_overview_feature_SNPs.png

Graphical representation of sequencing reads (grey bars) containing SNPs (yellow squares) from WGS libraries (upper) or GBS libraries (lower) that are mapped onto a reference genome sequence. These representations will be used to demonstrate the different options of SMAP snp-seq.

HIW: Extending regions with a (small) number of nucleotides can be advantageous for primer design, because the extended parts may provide more possibilities for primer3 to design primers around targets. Extending regions might be interesting for regions with a low SNP density, as it is unlikely that unknown SNPs are located directly flanking the region.

HIW: Figure 7. Primer design using GBS data and a BED file with region extension (lower) compared to primer design using GBS data and a BED file without region extension (upper). The template sequences are extended at the 5’ and 3’ end of the genomic coordinates in the BED file (orange lines) to the new region ends (green lines).

HIW: Figure 4. Primer design using GBS data with a BED file containing genomic coordinates (lower) compared to primer design using GBS data without a BED file (upper). Template sequences are not allowed to exceed the genomic coordinates in the BED file (shown by the orange lines).

HIW Figure 5. Primer design using WGS data and a BED file with the split_template option (lower) compared to primer design using WGS data and a BED file without the split_template option. The split template option splits the template sequence delineated in the BED file (orange lines) into multiple template sequences using the same reasoning as illustrated by figure 2.

SMAP snp-seq can also take a specific VCF file as input to define target SNPs:

If a VCF file with a user-defined selection of target SNPs is provided with the mandatory --target_vcf option, only SNPs in this VCF will be considered as potential targets for primer design. Target regions are defined as described directly above, taking the --maximum_target_size and --minimum_target_distance options into account. Other SNP positions listed in the VCF file provided with the optional --background_vcf option will be incorporated as “N” in the template sequences, but are not set as targets.

Figure 6. Primer design using GBS data and a BED file with offsets (lower) compared to primer design using GBS data and a BED file without offsets (upper). SNPs within offset regions (delineated by the green lines) are incorporated in the template sequences, but not included in target regions.