.. raw:: html
.. role:: purple
.. raw:: html
.. role:: white
.. _SMAPsnpseqscopeusage:
#############
Scope & Usage
#############
Scope
-----
SMAP snp-seq designs HiPlex primers encompassing dedicated polymorphic SNP sites, while taking neighboring SNPs into consideration.
It is a simple application to design primer panels for targeted amplicon resequencing taking known polymorphisms into account, and can be directed to pre-selected locations like GBS loci, or candidate genes.
:purple:`Input`
**SMAP snp-seq** only requires a reference sequence FASTA file and one VCF file with the polymorphisms that need to be screened. Optionally, one may provide a BED file with selected regions, or a VCF file with SNPs that specifically need to be targeted. Last, one may create a customized reference for a particular sample set by providing a VCF file with SNPs that need to be adjusted in the reference sequence prior to primer design.
:purple:`Output`
| **SMAP snp-seq** provides custom filters and a list of primers to order.
| **SMAP snp-seq** creates a BED file with SMAPs to delineate HiPlex loci for downstream analyses (*e.g.* SMAP haplotype-sites).
| **SMAP snp-seq** creates a GFF file with borders to delineate HiPlex windows for downstream analyses (*e.g.* SMAP haplotype-window).
| **SMAP snp-seq** plots :ref:`feature distributions ` such as length, :ref:`of amplicons `.
----
Integration in the SMAP workflow
--------------------------------
.. image:: ../images/snp-seq/SMAP_global_scheme_home_snp-seq.png
**SMAP snp-seq** is run on a reference sequence FASTA file and one or two VCF files, after variant calling and before **SMAP haplotype-sites** or **SMAP haplotype-window**.
**SMAP snp-seq** designs primer panels for HiPlex amplicon sequencing.
----
Guidelines for variant calling
------------------------------
See `Veeckman et al. (2019) `_ for a comparison of different SNP calling methods.
----
.. _SMAPsnpseqSummaryCommand:
Commands & options
------------------
:purple:`Mandatory options for SMAP snp-seq`
**SMAP snp-seq** only needs a reference sequence and known SNP positions.
| ``--vcf`` :white:`######` The VCF file with SNPs [no default].
| ``--reference`` :white:`##` The FASTA file with the reference genome sequence or candidate gene sequences [no default].
.. _SMAPdelfilter:
:purple:`Command line options`
See tabs below for command line options and specific filter options.
.. tabs::
.. tab:: input data options
**Input data options:**
| ``-i``, ``--input_directory`` :white:`##` *(str)* :white:`##` Input directory [current directory].
| ``-r``, ``--regions`` :white:`########` *(str)* :white:`##` Name of the BED file in the input directory containing the genomic coordinates of regions wherein primers must be designed [no BED file provided].
| ``--target_vcf`` :white:`###############` Name of the VCF file in the input directory containing target SNPs [no VCF file with target SNPs provided].
| ``--reference_vcf`` :white:`#############` Name of the VCF file in the input directory containing non-polymorphic differences between the reference genome sequence and the samples for primer design [no VCF file with reference genome differences provided].
.. tab:: amplicon design options
**Amplicon design options:**
| ``-d``, ``--variant_distance`` :white:`############` *(int)* :white:`###` Maximum distance (in bp) between two variants to be included in the same template [500].
| ``-t``, ``--target_size`` :white:`###############` *(int)* :white:`###` Maximum size (in bp) of a target region [10].
| ``-u``, ``--target_distance`` :white:`############` *(int)* :white:`###` Minimum distance (in bp) between two target regions in a template [0].
| ``-min``, ``--minimum_amplicon_size`` :white:`#######` *(int)* :white:`###` Minimum size of an amplicon (incl. primers) in bp [100].
| ``-max``, ``--maximum_amplicon_size`` :white:`#######` *(int)* :white:`###` Maximum size of an amplicon (incl. primers) in bp [110].
| ``--offset`` :white:`#######################` *(int)* :white:`###` Size of the offset at the 5' and 3' end of each target region. Variants in the offsets are not tagged as targets for primer design [0, all variants are potential targets].
| ``-minp``, ``--minimum_primer_size`` :white:`########` *(int)* :white:`###` Minimum size (in bp) of a primer [18].
| ``-maxp``, ``--maximum_primer_size`` :white:`########` *(int)* :white:`###` Maximum size (in bp) of a primer [27].
| ``-optp``, ``--optimal_primer_size`` :white:`########` *(int)* :white:`###` Optimal size (in bp) of a primer [20].
| ``-max_misp``, ``--maximum_mispriming`` :white:`######` *(int)* :white:`###` Maximum allowed weighted similarity of a primer with the same template and other templates [12].
| ``-maxn``, ``--maximum_unknown_nucleotides`` :white:`##` *(int)* :white:`###` Maximum number of unknown nucleotides (N) in a primer sequence [0].
| ``-ex``, ``--region_extension`` :white:`###########` *(int)* :white:`###` Extend regions in the BED file provided via the ``--regions`` option at their 5' end 3' end with the provided value [0, no region extension].
| ``--retain_overlap`` :white:`#######################` Retain overlap in template sequences among regions [overlap in template sequences is removed].
| ``--split_template`` :white:`#######################` Split the regions in the BED file provided via the ``--regions`` option in multiple templates based on the variant_distance [regions are not split].
Options may be given in any order.
Command to run **SMAP snp-seq**::
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta
.. tab:: output data options
| ``-o``, ``--output_directory`` :white:`###` *(str)* :white:`###` Path to the output directory [current directory].
| ``-b``, ``--border_length`` :white:`#####` *(int)* :white:`###` Border size used in the GFF file that defines the windows for SMAP haplotype-window [10].
| ``-s``, ``--suffix`` :white:`##########` *(str)* :white:`###` Suffix added to output files [set_1].
Options may be given in any order.
Command to run **SMAP snp-seq** with adjusted border length and suffix to denote the design settings::
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta -b 10 -s Lp_120_180bp
----
.. _SMAPsnpseqexcommands:
Example commands
----------------
.. tabs::
.. tab:: simple design
Basic command to run SMAP snp-seq::
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta
.. tab:: design with target regions
Command to run SMAP snp-seq for a subset of regions::
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta --target.vcf targets.vcf
.. tab:: design with background SNP file
Command to run SMAP snp-seq with secondary file with background variation::
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta --reference_vcf reference_variants.vcf
.. _SMAPsnpseqoutput:
Output
------
.. tabs::
.. tab:: Graphical output
| By default, **SMAP snp-seq** does not provide graphical output.
.. tab:: Tabular output
| **SMAP snp-seq** creates a FASTA file with primer sequences, a GFF file with primer positions on the reference sequence, a BED file with SMAPs for **SMAP haplotype-sites**, and a GFF file with borders for **SMAP haplotype-window**.