Install the SMAP package following the instructions below. To check that the installation and dependencies work as expected on your system, you can download a set of test (input) data from Zenodo, and run a few basic commands to test the various components of the SMAP package. More information with detailed examples of typical SMAP workflows are provided in the Tutorial section.

Installation

The latest release of the SMAP package can be found on the Gitlab repository. Running SMAP on GBS data requires special preprocessing of GBS reads before read mapping. Please use instructions and software for GBS read preprocessing as described in the manual of GBprocesS.

The fastest way to install the SMAP package is with pip:

pip install ngs-smap

commands to get the utility python scripts from the repo’s:

curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_sliding-frames.py -o SMAP_sliding-frames.py
curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_snp-seq.py -o SMAP_snp-seq.py
curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_target-selection.py -o SMAP_target-selection.py

Install from gitlab in a virtual environment:

git clone https://gitlab.ilvo.be/genomics/smap-package/smap.git
cd smap
git checkout master
python -m venv .smap_venv
source .smap_venv/bin/activate
pip install --upgrade pip
pip install .
# already includes the utilities scripts

Install the development version:

git clone https://gitlab.ilvo.be/genomics/smap-package/smap.git
cd smap
git checkout dev
python -m venv .smap_dev_venv
source .smap_dev_venv/bin/activate
pip install --upgrade pip
pip install .
# already includes the utilities scripts

SMAP is only available for linux operating systems.

A basic guide for running software on the linux command line can be found on Ubuntu’s site.

Quick start

To check that the installation and dependencies work as expected on your system, you can download a set of test (input) data from Zenodo, and run a few basic commands to test the various components of the SMAP package.

Analysis of simulated GBS data

Activate your virtual environment

source .venv/bin/activate

Define the path to the download directory:

download_dir=$/PATH/TO/DOWNLOAD_DIR/
or (from the download dir itself):
download_dir=$(pwd)

SMAP delineate

Create and navigate to a new output directory to run SMAP delineate on the BAM files with mapped GBS reads:

mkdir -p $download_dir/Simulated_data/SMAP_delineate/output/
cd $download_dir/Simulated_data/SMAP_delineate/output/
smap delineate $download_dir/Simulated_data/SMAP_delineate/input/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 10 --min_cluster_depth 10 --max_cluster_depth 1500 --max_smap_number 20 --name GBS

SMAP haplotype-sites

Create and navigate to a new output directory to run SMAP haplotype-sites on the BAM files with mapped GBS reads, its BED file from SMAP delineate, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. SAMtools, BEDtools, Freebayes, or GATK):

mkdir -p $download_dir/Simulated_data/SMAP_haplotype_sites/output
cd $download_dir/Simulated_data/SMAP_haplotype_sites/output
smap haplotype-sites $download_dir/Simulated_data/SMAP_delineate/input/ $download_dir/Simulated_data/SMAP_delineate/output/final_stack_positions_GBS_C0_SMAP20_CL50_300.bed $download_dir/Simulated_data/SMAP_haplotype_sites/input/snps.vcf --out prefix -mapping_orientation ignore --discrete_calls dosage --frequency_interval_bounds diploid --dosage_filter 2 --plot all --plot_type pdf -partial include --min_distinct_haplotypes 2 --min_read_count 10 --min_haplotype_frequency 5 --processes 8

Deactivate your virtual environment:

deactivate

Analysis of real GBS data

Activate your virtual environment

source .venv/bin/activate

Define the path to the download directory:

download_dir=$/PATH/TO/DOWNLOAD_DIR/
or (from the download dir itself):
download_dir=$(pwd)

SMAP delineate

Create and navigate to a new output directory to run SMAP delineate on the BAM files with mapped GBS reads of a set of individuals:

mkdir -p $download_dir/Real_data/SMAP_delineate/output/ind
cd $download_dir/Real_data/SMAP_delineate/output/ind/
smap delineate $download_dir/Real_data/SMAP_delineate/input/ind/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 10 --min_cluster_depth 10 --max_cluster_depth 1500 --max_smap_number 20 --name 48_ind_GBS-PE

Create and navigate to a new output directory to run SMAP delineate on the BAM files with mapped GBS reads of a set of pool samples:

mkdir -p $download_dir/Real_data/SMAP_delineate/output/pools
cd $download_dir/Real_data/SMAP_delineate/output/pools/
smap delineate $download_dir/Real_data/SMAP_delineate/input/pools/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 5 --min_cluster_depth 30 --max_cluster_depth 1500 --max_smap_number 20 --name 16_pools_GBS-PE

SMAP compare

Create and navigate to a new output directory to run SMAP compare on the two BED files with MergedClusters generated by SMAP delineate:

mkdir $download_dir/Real_data/SMAP_compare/output
cd $download_dir/Real_data/SMAP_compare/output
smap compare $download_dir/Real_data/SMAP_delineate/output/ind/final_stack_positions_48_ind_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_delineate/output/pools/final_stack_positions_16_pools_GBS-PE_C0_SMAP20_CL50_300.bed

SMAP haplotype-sites

Create and navigate to a new output directory to run SMAP haplotype-sites on the BAM files with mapped GBS reads of a set of individuals, its BED file from SMAP delineate, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. SAMtools, BEDtools, Freebayes, or GATK for individuals):

mkdir -p $download_dir/Real_data/SMAP_haplotype_sites/output/ind
cd $download_dir/Real_data/SMAP_haplotype_sites/output/ind
smap haplotype-sites $download_dir/Real_data/SMAP_delineate/input/ind/ $download_dir/Real_data/SMAP_delineate/output/ind/final_stack_positions_48_ind_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_haplotype_sites/input/48_ind_GBS-PE.vcf --out haplotypes_48_ind_GBS-PE -mapping_orientation ignore --discrete_calls dosage --frequency_interval_bounds diploid --dosage_filter 2 --plot all --plot_type pdf -partial include --min_distinct_haplotypes 2 --min_read_count 10 --min_haplotype_frequency 5 --processes 8

Create and navigate to a new output directory to run SMAP haplotype-sites on the BAM files with mapped GBS reads of a set of pool samples, its BED file from SMAP delineate, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. SNAPE-pooled for Pool-Seq data):

mkdir -p $download_dir/Real_data/SMAP_haplotype_sites/output/pools
cd $download_dir/Real_data/SMAP_delineate/output/pools/
smap haplotype-sites $download_dir/Real_data/SMAP_delineate/input/pools/ $download_dir/Real_data/SMAP_delineate/output/pools/final_stack_positions_16_pools_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_haplotype_sites/input/48_ind_GBS-PE.vcf --out haplotypes_16_pools_GBS-PE -mapping_orientation ignore --plot all --plot_type pdf --mask_frequency 1 --undefined_representation "" -partial include --min_distinct_haplotypes 2 --min_read_count 30 --min_haplotype_frequency 5 --processes 8

Deactivate your virtual environment:

deactivate

Analysis of real CRISPR/Cas genome editing data

Activate your virtual environment:

source .venv/bin/activate

Define the path to the download directory:

download_dir=$/PATH/TO/DOWNLOAD_DIR/
or (from the download dir itself):
download_dir=$(pwd)

SMAP haplotype-window

Create and navigate to a new output directory to run SMAP haplotype-window on a set of FASTQ files with HiPlex reads, their mapped BAM files, its GFF file with border positions and a gRNA FASTA file from SMAP design:

mkdir -p $download_dir/Real_data/SMAP_haplotype_window/output
cd $download_dir/Real_data/SMAP_haplotype_window/output
smap haplotype-window $download_dir/Real_data/SMAP_haplotype_window/input/reference.fasta $download_dir/Real_data/SMAP_haplotype_window/input/borders.gff $download_dir/Real_data/SMAP_haplotype_window/input/ $download_dir/Real_data/SMAP_haplotype_window/input/ --mask_frequency 2 --undefined_representation "" --min_read_count 30 --min_haplotype_frequency 5 --processes 8

SMAP effect-prediction

Create and navigate to a new output directory to run SMAP effect-prediction on: the haplotype frequency table obtained with SMAP haplotype-window, a FASTA file with reference gene sequences, a GFF with associated gene feature positions, a GFF file with border positions, and the gRNA GFF file from SMAP design:

mkdir -p $download_dir/Real_data/SMAP_effect_prediction/output
cd $download_dir/Real_data/SMAP_SMAP_effect_prediction/output
smap effect-prediction $download_dir/Real_data/SMAP_effect_prediction/input/haplotype_frequency.tsv $download_dir/Real_data/SMAP_effect_prediction/input/genome.fasta $download_dir/Real_data/SMAP_effect_prediction/input/borders.gff -a $download_dir/Real_data/SMAP_effect_prediction/input/gene_features.gff -u $download_dir/Real_data/SMAP_effect_prediction/input/guides.gff -p CAS9 -s 15 -r 20 -e dosage -i diploid -t 70

Deactivate your virtual environment:

deactivate