Scope & Usage
Scope
SMAP snp-seq designs HiPlex primers encompassing dedicated polymorphic SNP sites, while taking neighboring SNPs into consideration. It is a simple application to design primer panels for targeted amplicon resequencing taking known polymorphisms into account, and can be directed to pre-selected locations like GBS loci, or candidate genes.
Input
SMAP snp-seq only requires a reference sequence FASTA file and one VCF file with the polymorphisms that need to be screened. Optionally, one may provide a BED file with selected regions, or a VCF file with SNPs that specifically need to be targeted. Last, one may create a customized reference for a particular sample set by providing a VCF file with SNPs that need to be adjusted in the reference sequence prior to primer design.
Output
Integration in the SMAP workflow
SMAP snp-seq is run on a reference sequence FASTA file and one or two VCF files, after variant calling and before SMAP haplotype-sites or SMAP haplotype-window. SMAP snp-seq designs primer panels for HiPlex amplicon sequencing.
Guidelines for variant calling
See Veeckman et al. (2019) for a comparison of different SNP calling methods.
Commands & options
Mandatory options for SMAP snp-seq
SMAP snp-seq only needs a reference sequence and known SNP positions.
--vcf
###### The VCF file with SNPs [no default].--reference
## The FASTA file with the reference genome sequence or candidate gene sequences [no default].Command line options
See tabs below for command line options and specific filter options.
Input data options:
-i
,--input_directory
## (str) ## Input directory [current directory].-r
,--regions
######## (str) ## Name of the BED file in the input directory containing the genomic coordinates of regions wherein primers must be designed [no BED file provided].--target_vcf
############### Name of the VCF file in the input directory containing target SNPs [no VCF file with target SNPs provided].--reference_vcf
############# Name of the VCF file in the input directory containing non-polymorphic differences between the reference genome sequence and the samples for primer design [no VCF file with reference genome differences provided].
Amplicon design options:
-d
,--variant_distance
############ (int) ### Maximum distance (in bp) between two variants to be included in the same template [500].-t
,--target_size
############### (int) ### Maximum size (in bp) of a target region [10].-u
,--target_distance
############ (int) ### Minimum distance (in bp) between two target regions in a template [0].-min
,--minimum_amplicon_size
####### (int) ### Minimum size of an amplicon (incl. primers) in bp [100].-max
,--maximum_amplicon_size
####### (int) ### Maximum size of an amplicon (incl. primers) in bp [110].--offset
####################### (int) ### Size of the offset at the 5’ and 3’ end of each target region. Variants in the offsets are not tagged as targets for primer design [0, all variants are potential targets].-minp
,--minimum_primer_size
######## (int) ### Minimum size (in bp) of a primer [18].-maxp
,--maximum_primer_size
######## (int) ### Maximum size (in bp) of a primer [27].-optp
,--optimal_primer_size
######## (int) ### Optimal size (in bp) of a primer [20].-max_misp
,--maximum_mispriming
###### (int) ### Maximum allowed weighted similarity of a primer with the same template and other templates [12].-maxn
,--maximum_unknown_nucleotides
## (int) ### Maximum number of unknown nucleotides (N) in a primer sequence [0].-ex
,--region_extension
########### (int) ### Extend regions in the BED file provided via the--regions
option at their 5’ end 3’ end with the provided value [0, no region extension].--retain_overlap
####################### Retain overlap in template sequences among regions [overlap in template sequences is removed].--split_template
####################### Split the regions in the BED file provided via the--regions
option in multiple templates based on the variant_distance [regions are not split].
Options may be given in any order.
Command to run SMAP snp-seq:
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta
-o
,--output_directory
### (str) ### Path to the output directory [current directory].-b
,--border_length
##### (int) ### Border size used in the GFF file that defines the windows for SMAP haplotype-window [10].-s
,--suffix
########## (str) ### Suffix added to output files [set_1].
Options may be given in any order.
Command to run SMAP snp-seq with adjusted border length and suffix to denote the design settings:
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta -b 10 -s Lp_120_180bp
Example commands
Basic command to run SMAP snp-seq:
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta
Command to run SMAP snp-seq for a subset of regions:
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta --target.vcf targets.vcf
Command to run SMAP snp-seq with secondary file with background variation:
python3 SMAP_snp-seq.py -i /path/to/dir/ --vcf variants.vcf --reference genome.fasta --reference_vcf reference_variants.vcf