.. raw:: html
.. role:: white
.. role:: purple
.. role:: green
.. role:: blue
.. role:: red
Install the SMAP package following the instructions below.
To check that the installation and dependencies work as expected on your system, you can download a set of test (input) data from Zenodo, and run a few basic commands to test the various components of the SMAP package.
More information with detailed examples of typical SMAP workflows are provided in the Tutorial section.
.. _SMAPinstallationquickstart:
############
Installation
############
The latest release of the SMAP package can be found on the `Gitlab repository `_.
Running SMAP on GBS data requires special preprocessing of GBS reads before read mapping. Please use instructions and software for GBS read preprocessing as described in the manual of `GBprocesS `_.
The fastest way to install the SMAP package is with pip::
pip install ngs-smap
commands to get the utility python scripts from the repo's::
curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_sliding-frames.py -o SMAP_sliding-frames.py
curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_snp-seq.py -o SMAP_snp-seq.py
curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_target-selection.py -o SMAP_target-selection.py
Install from gitlab in a virtual environment::
git clone https://gitlab.ilvo.be/genomics/smap-package/smap.git
cd smap
git checkout master
python -m venv .smap_venv
source .smap_venv/bin/activate
pip install --upgrade pip
pip install .
# already includes the utilities scripts
Install the development version::
git clone https://gitlab.ilvo.be/genomics/smap-package/smap.git
cd smap
git checkout dev
python -m venv .smap_dev_venv
source .smap_dev_venv/bin/activate
pip install --upgrade pip
pip install .
# already includes the utilities scripts
| SMAP is only available for linux operating systems.
| A basic guide for running software on the linux command line can be found on `Ubuntu `_'s site.
----
###########
Quick start
###########
To check that the installation and dependencies work as expected on your system, you can download a set of test (input) data from Zenodo, and run a few basic commands to test the various components of the SMAP package.
----
##############################
Analysis of simulated GBS data
##############################
Activate your virtual environment ::
source .venv/bin/activate
Define the path to the download directory::
download_dir=$/PATH/TO/DOWNLOAD_DIR/
or (from the download dir itself):
download_dir=$(pwd)
SMAP delineate
--------------
Create and navigate to a new output directory to run **SMAP delineate** on the BAM files with mapped GBS reads::
mkdir -p $download_dir/Simulated_data/SMAP_delineate/output/
cd $download_dir/Simulated_data/SMAP_delineate/output/
smap delineate $download_dir/Simulated_data/SMAP_delineate/input/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 10 --min_cluster_depth 10 --max_cluster_depth 1500 --max_smap_number 20 --name GBS
SMAP haplotype-sites
--------------------
Create and navigate to a new output directory to run **SMAP haplotype-sites** on the BAM files with mapped GBS reads, its BED file from **SMAP delineate**, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. `SAMtools `_, `BEDtools `_, `Freebayes `_, or `GATK `_)::
mkdir -p $download_dir/Simulated_data/SMAP_haplotype_sites/output
cd $download_dir/Simulated_data/SMAP_haplotype_sites/output
smap haplotype-sites $download_dir/Simulated_data/SMAP_delineate/input/ $download_dir/Simulated_data/SMAP_delineate/output/final_stack_positions_GBS_C0_SMAP20_CL50_300.bed $download_dir/Simulated_data/SMAP_haplotype_sites/input/snps.vcf --out prefix -mapping_orientation ignore --discrete_calls dosage --frequency_interval_bounds diploid --dosage_filter 2 --plot all --plot_type pdf -partial include --min_distinct_haplotypes 2 --min_read_count 10 --min_haplotype_frequency 5 --processes 8
Deactivate your virtual environment::
deactivate
----
#########################
Analysis of real GBS data
#########################
Activate your virtual environment ::
source .venv/bin/activate
Define the path to the download directory::
download_dir=$/PATH/TO/DOWNLOAD_DIR/
or (from the download dir itself):
download_dir=$(pwd)
SMAP delineate
--------------
Create and navigate to a new output directory to run **SMAP delineate** on the BAM files with mapped GBS reads of a set of **individuals**::
mkdir -p $download_dir/Real_data/SMAP_delineate/output/ind
cd $download_dir/Real_data/SMAP_delineate/output/ind/
smap delineate $download_dir/Real_data/SMAP_delineate/input/ind/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 10 --min_cluster_depth 10 --max_cluster_depth 1500 --max_smap_number 20 --name 48_ind_GBS-PE
Create and navigate to a new output directory to run **SMAP delineate** on the BAM files with mapped GBS reads of a set of **pool samples**::
mkdir -p $download_dir/Real_data/SMAP_delineate/output/pools
cd $download_dir/Real_data/SMAP_delineate/output/pools/
smap delineate $download_dir/Real_data/SMAP_delineate/input/pools/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 5 --min_cluster_depth 30 --max_cluster_depth 1500 --max_smap_number 20 --name 16_pools_GBS-PE
SMAP compare
------------
Create and navigate to a new output directory to run **SMAP compare** on the two BED files with MergedClusters generated by **SMAP delineate**::
mkdir $download_dir/Real_data/SMAP_compare/output
cd $download_dir/Real_data/SMAP_compare/output
smap compare $download_dir/Real_data/SMAP_delineate/output/ind/final_stack_positions_48_ind_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_delineate/output/pools/final_stack_positions_16_pools_GBS-PE_C0_SMAP20_CL50_300.bed
SMAP haplotype-sites
--------------------
Create and navigate to a new output directory to run **SMAP haplotype-sites** on the BAM files with mapped GBS reads of a set of **individuals**, its BED file from **SMAP delineate**, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. `SAMtools `_, `BEDtools `_, `Freebayes `_, or `GATK `_ for individuals)::
mkdir -p $download_dir/Real_data/SMAP_haplotype_sites/output/ind
cd $download_dir/Real_data/SMAP_haplotype_sites/output/ind
smap haplotype-sites $download_dir/Real_data/SMAP_delineate/input/ind/ $download_dir/Real_data/SMAP_delineate/output/ind/final_stack_positions_48_ind_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_haplotype_sites/input/48_ind_GBS-PE.vcf --out haplotypes_48_ind_GBS-PE -mapping_orientation ignore --discrete_calls dosage --frequency_interval_bounds diploid --dosage_filter 2 --plot all --plot_type pdf -partial include --min_distinct_haplotypes 2 --min_read_count 10 --min_haplotype_frequency 5 --processes 8
Create and navigate to a new output directory to run **SMAP haplotype-sites** on the BAM files with mapped GBS reads of a set of **pool samples**, its BED file from **SMAP delineate**, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. `SNAPE-pooled `_ for Pool-Seq data)::
mkdir -p $download_dir/Real_data/SMAP_haplotype_sites/output/pools
cd $download_dir/Real_data/SMAP_delineate/output/pools/
smap haplotype-sites $download_dir/Real_data/SMAP_delineate/input/pools/ $download_dir/Real_data/SMAP_delineate/output/pools/final_stack_positions_16_pools_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_haplotype_sites/input/48_ind_GBS-PE.vcf --out haplotypes_16_pools_GBS-PE -mapping_orientation ignore --plot all --plot_type pdf --mask_frequency 1 --undefined_representation "" -partial include --min_distinct_haplotypes 2 --min_read_count 30 --min_haplotype_frequency 5 --processes 8
Deactivate your virtual environment::
deactivate
----
###############################################
Analysis of real CRISPR/Cas genome editing data
###############################################
Activate your virtual environment::
source .venv/bin/activate
Define the path to the download directory::
download_dir=$/PATH/TO/DOWNLOAD_DIR/
or (from the download dir itself):
download_dir=$(pwd)
SMAP haplotype-window
---------------------
Create and navigate to a new output directory to run **SMAP haplotype-window** on a set of FASTQ files with HiPlex reads, their mapped BAM files, its GFF file with border positions and a gRNA FASTA file from **SMAP design**::
mkdir -p $download_dir/Real_data/SMAP_haplotype_window/output
cd $download_dir/Real_data/SMAP_haplotype_window/output
smap haplotype-window $download_dir/Real_data/SMAP_haplotype_window/input/reference.fasta $download_dir/Real_data/SMAP_haplotype_window/input/borders.gff $download_dir/Real_data/SMAP_haplotype_window/input/ $download_dir/Real_data/SMAP_haplotype_window/input/ --mask_frequency 2 --undefined_representation "" --min_read_count 30 --min_haplotype_frequency 5 --processes 8
SMAP effect-prediction
----------------------
Create and navigate to a new output directory to run **SMAP effect-prediction** on: the haplotype frequency table obtained with **SMAP haplotype-window**, a FASTA file with reference gene sequences, a GFF with associated gene feature positions, a GFF file with border positions, and the gRNA GFF file from **SMAP design**::
mkdir -p $download_dir/Real_data/SMAP_effect_prediction/output
cd $download_dir/Real_data/SMAP_SMAP_effect_prediction/output
smap effect-prediction $download_dir/Real_data/SMAP_effect_prediction/input/haplotype_frequency.tsv $download_dir/Real_data/SMAP_effect_prediction/input/genome.fasta $download_dir/Real_data/SMAP_effect_prediction/input/borders.gff -a $download_dir/Real_data/SMAP_effect_prediction/input/gene_features.gff -u $download_dir/Real_data/SMAP_effect_prediction/input/guides.gff -p CAS9 -s 15 -r 20 -e dosage -i diploid -t 70
Deactivate your virtual environment::
deactivate