.. raw:: html .. role:: purple .. raw:: html .. role:: white .. _SMAPsnpseqscopeusage: ############# Scope & Usage ############# Scope ----- **SMAP snp-seq** designs HiPlex primers encompassing dedicated polymorphic SNP sites, while avoiding neighboring 'background' SNPs. It is a simple application to design primer panels for targeted multiplex amplicon resequencing taking known polymorphisms into account, and can be directed to pre-selected locations like GBS loci or candidate genes. :purple:`Input` **SMAP snp-seq** only requires a reference sequence FASTA file and one VCF file with the SNPs that need to be targeted. Optionally, one may provide a BED file with selected template regions, or a VCF file with 'background' SNPs that need to be avoided during primer design. Last, one may create a customized reference for a particular sample set by providing a VCF file with SNPs where the reference nucleotide is substituted by the alternative nucleotide in the reference sequence prior to primer design. :purple:`Output` | **SMAP snp-seq** provides a list of primers to order. | **SMAP snp-seq** creates a BED file with SMAPs to delineate HiPlex loci for downstream analyses with **SMAP haplotype-sites**. | **SMAP snp-seq** creates a GFF file with borders to delineate HiPlex windows for downstream analyses with **SMAP haplotype-window**. | **SMAP snp-seq** plots feature distributions such as length of amplicons. ---- Integration in the SMAP workflow -------------------------------- .. image:: ../images/snp-seq/SMAP_global_scheme_home_snp-seq.png **SMAP snp-seq** is run on a reference sequence FASTA file and one or two VCF files, after variant calling and before **SMAP haplotype-sites** or **SMAP haplotype-window**. **SMAP snp-seq** designs primer panels for HiPlex amplicon sequencing. ---- Guidelines for variant calling ------------------------------ See `Veeckman et al. (2019) `_ for a comparison of different SNP calling methods. ---- .. _SMAPsnpseqSummaryCommand: Commands & options ------------------ :purple:`Mandatory options for SMAP snp-seq` **SMAP snp-seq** only needs a reference sequence and known SNP positions to target. | ``--reference`` :white:`###` The FASTA file with the reference sequence [no default]. | ``--target_vcf`` :white:`##` The VCF file with SNPs [no default]. .. _SMAPdelfilter: :purple:`Command line options` See tabs below for command line options and specific filter options. .. tabs:: .. tab:: input data options **Input data options:** | ``-i``, ``--input_directory`` :white:`#####` *(str)* :white:`##` Input directory [current directory]. | ``--template_region`` :white:`########` *(str)* :white:`##` Name of the BED file in the input directory containing the genomic coordinates of regions wherein primers must be designed [no BED file provided]. | ``--background_vcf`` :white:`##############` Name of the VCF file in the input directory containing target SNPs [no VCF file with target SNPs provided]. | ``--customized_reference`` :white:`##########` Name of the VCF file in the input directory containing non-polymorphic differences between the reference genome sequence and the samples for primer design [no VCF file with reference genome differences provided]. .. tab:: amplicon design options **Amplicon design options:** | ``--maximum_variant_distance`` :white:`#######` *(int)* :white:`###` Maximum distance (in bp) between two variants to be included in the same template region [500]. | ``--flanking_region`` :white:`##########` *(int)* :white:`###` Length of the flanking region (in bp) to be added on both ends of the central template region [half of the maximum variant distance]. | ``--maximum_target_size`` :white:`###############` *(int)* :white:`###` Maximum size (in bp) of a target region [10]. | ``--minimum_target_distance`` :white:`############` *(int)* :white:`###` Minimum distance (in bp) between two target regions in a template [0]. | ``--minimum_amplicon_size`` :white:`#######` *(int)* :white:`###` Minimum size of an amplicon (incl. primers) in bp [100]. | ``--maximum_amplicon_size`` :white:`#######` *(int)* :white:`###` Maximum size of an amplicon (incl. primers) in bp [110]. | ``--offset`` :white:`#######################` *(int)* :white:`###` Size of the offset at the 5' and 3' end of each target region. Variants in the region covered by offset are not tagged as targets for primer design [0, all variants are potential targets]. | ``--minimum_primer_size`` :white:`########` *(int)* :white:`###` Minimum size (in bp) of a primer [18]. | ``--maximum_primer_size`` :white:`########` *(int)* :white:`###` Maximum size (in bp) of a primer [27]. | ``--optimal_primer_size`` :white:`########` *(int)* :white:`###` Optimal size (in bp) of a primer [20]. | ``--maximum_mispriming`` :white:`######` *(int)* :white:`###` Maximum allowed weighted similarity of a primer with the same template and other templates [12]. | ``--maximum_number_degenerate_nucleotides`` :white:`##` *(int)* :white:`###` Maximum number of degenerate nucleotides (N) in a primer sequence [0]. | ``--region_extension`` :white:`###########` *(int)* :white:`###` Extend template regions in the BED file provided via the ``--template_region`` option at their 5' end 3' end with the provided value [0, no template region extension]. | ``--retain_overlap`` :white:`#############` Retain overlap among template regions [overlap in template regions is removed]. | ``--split_template_region`` :white:`#########` Split the regions in the BED file provided via the ``--template_region`` option in multiple templates based on the maximum_variant_distance [template regions are not split]. Options may be given in any order. Command to run **SMAP snp-seq**:: python3 SMAP_snp-seq.py -i /path/to/dir/ --target_vcf variants.vcf --reference genome.fasta .. tab:: output data options | ``-o``, ``--output_directory`` :white:`###` *(str)* :white:`###` Path to the output directory [current directory]. | ``--border_length`` :white:`#####` *(int)* :white:`###` Border size used in the GFF file that defines the windows for SMAP haplotype-window [10]. | ``--suffix`` :white:`##########` *(str)* :white:`###` Suffix added to output files [set_1]. Options may be given in any order. Command to run **SMAP snp-seq** with adjusted border length and suffix to denote the design settings:: python3 SMAP_snp-seq.py -i /path/to/dir/ --target_vcf variants.vcf --reference genome.fasta --border_length 10 --suffix Lp_120_180bp ---- .. _SMAPsnpseqexcommands: Example commands ---------------- .. tabs:: .. tab:: simple design Basic command to run SMAP snp-seq with target SNPs:: python3 SMAP_snp-seq.py -i /path/to/dir/ --target_vcf variants.vcf --reference genome.fasta .. tab:: design with background SNP file Command to run SMAP snp-seq for a set of target SNPs while avoiding background SNPs for primer design:: python3 SMAP_snp-seq.py -i /path/to/dir/ --target_vcf variants.vcf --reference genome.fasta --background_vcf background_snps.vcf .. tab:: design with customized reference sequence Command to run SMAP snp-seq with a set of SNPs to substitute in a customized reference sequence:: python3 SMAP_snp-seq.py -i /path/to/dir/ --target_vcf variants.vcf --reference genome.fasta --customized_reference reference_variants.vcf .. tab:: design with predefined template regions Command to run SMAP snp-seq for a specific set of loci (template regions):: python3 SMAP_snp-seq.py -i /path/to/dir/ --target_vcf variants.vcf --reference genome.fasta --template_region gbs_centralregion.bed .. _SMAPsnpseqoutput: Output ------ .. tabs:: .. tab:: Graphical output | By default, **SMAP snp-seq** does not provide graphical output. .. tab:: Tabular output | **SMAP snp-seq** creates a FASTA file with primer sequences, a FASTA file with amplicon sequences, a GFF file with primer positions on the reference sequence, a BED file with SMAPs for **SMAP haplotype-sites**, and a GFF file with borders for **SMAP haplotype-window**.