.. raw:: html .. role:: white .. role:: purple .. role:: green .. role:: blue .. role:: red Install the SMAP package following the instructions below. To check that the installation and dependencies work as expected on your system, you can download a set of test (input) data from Zenodo, and run a few basic commands to test the various components of the SMAP package. More information with detailed examples of typical SMAP workflows are provided in the Tutorial section. .. _SMAPinstallationquickstart: ############ Installation ############ The latest release of the SMAP package can be found on the `Gitlab repository `_. Running SMAP on GBS data requires special preprocessing of GBS reads before read mapping. Please use instructions and software for GBS read preprocessing as described in the manual of `GBprocesS `_. The fastest way to install the SMAP package is with pip:: pip install ngs-smap commands to get the utility python scripts from the repo's:: curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_sliding-frames.py -o SMAP_sliding-frames.py curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_snp-seq.py -o SMAP_snp-seq.py curl https://gitlab.ilvo.be/genomics/smap-package/smap/-/raw/master/utilities/SMAP_target-selection.py -o SMAP_target-selection.py Install from gitlab in a virtual environment:: git clone https://gitlab.ilvo.be/genomics/smap-package/smap.git cd smap git checkout master python -m venv .smap_venv source .smap_venv/bin/activate pip install --upgrade pip pip install . # already includes the utilities scripts Install the development version:: git clone https://gitlab.ilvo.be/genomics/smap-package/smap.git cd smap git checkout dev python -m venv .smap_dev_venv source .smap_dev_venv/bin/activate pip install --upgrade pip pip install . # already includes the utilities scripts | SMAP is only available for linux operating systems. | A basic guide for running software on the linux command line can be found on `Ubuntu `_'s site. ---- ########### Quick start ########### To check that the installation and dependencies work as expected on your system, you can download a set of test (input) data from Zenodo, and run a few basic commands to test the various components of the SMAP package. ---- ############################## Analysis of simulated GBS data ############################## Activate your virtual environment :: source .venv/bin/activate Define the path to the download directory:: download_dir=$/PATH/TO/DOWNLOAD_DIR/ or (from the download dir itself): download_dir=$(pwd) SMAP delineate -------------- Create and navigate to a new output directory to run **SMAP delineate** on the BAM files with mapped GBS reads:: mkdir -p $download_dir/Simulated_data/SMAP_delineate/output/ cd $download_dir/Simulated_data/SMAP_delineate/output/ smap delineate $download_dir/Simulated_data/SMAP_delineate/input/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 10 --min_cluster_depth 10 --max_cluster_depth 1500 --max_smap_number 20 --name GBS SMAP haplotype-sites -------------------- Create and navigate to a new output directory to run **SMAP haplotype-sites** on the BAM files with mapped GBS reads, its BED file from **SMAP delineate**, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. `SAMtools `_, `BEDtools `_, `Freebayes `_, or `GATK `_):: mkdir -p $download_dir/Simulated_data/SMAP_haplotype_sites/output cd $download_dir/Simulated_data/SMAP_haplotype_sites/output smap haplotype-sites $download_dir/Simulated_data/SMAP_delineate/input/ $download_dir/Simulated_data/SMAP_delineate/output/final_stack_positions_GBS_C0_SMAP20_CL50_300.bed $download_dir/Simulated_data/SMAP_haplotype_sites/input/snps.vcf --out prefix -mapping_orientation ignore --discrete_calls dosage --frequency_interval_bounds diploid --dosage_filter 2 --plot all --plot_type pdf -partial include --min_distinct_haplotypes 2 --min_read_count 10 --min_haplotype_frequency 5 --processes 8 Deactivate your virtual environment:: deactivate ---- ######################### Analysis of real GBS data ######################### Activate your virtual environment :: source .venv/bin/activate Define the path to the download directory:: download_dir=$/PATH/TO/DOWNLOAD_DIR/ or (from the download dir itself): download_dir=$(pwd) SMAP delineate -------------- Create and navigate to a new output directory to run **SMAP delineate** on the BAM files with mapped GBS reads of a set of **individuals**:: mkdir -p $download_dir/Real_data/SMAP_delineate/output/ind cd $download_dir/Real_data/SMAP_delineate/output/ind/ smap delineate $download_dir/Real_data/SMAP_delineate/input/ind/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 10 --min_cluster_depth 10 --max_cluster_depth 1500 --max_smap_number 20 --name 48_ind_GBS-PE Create and navigate to a new output directory to run **SMAP delineate** on the BAM files with mapped GBS reads of a set of **pool samples**:: mkdir -p $download_dir/Real_data/SMAP_delineate/output/pools cd $download_dir/Real_data/SMAP_delineate/output/pools/ smap delineate $download_dir/Real_data/SMAP_delineate/input/pools/ -mapping_orientation ignore --processes 8 --plot all --plot_type pdf --min_stack_depth 2 --max_stack_depth 1500 --min_cluster_length 50 --max_cluster_length 300 --max_stack_number 20 --min_stack_depth_fraction 5 --min_cluster_depth 30 --max_cluster_depth 1500 --max_smap_number 20 --name 16_pools_GBS-PE SMAP compare ------------ Create and navigate to a new output directory to run **SMAP compare** on the two BED files with MergedClusters generated by **SMAP delineate**:: mkdir $download_dir/Real_data/SMAP_compare/output cd $download_dir/Real_data/SMAP_compare/output smap compare $download_dir/Real_data/SMAP_delineate/output/ind/final_stack_positions_48_ind_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_delineate/output/pools/final_stack_positions_16_pools_GBS-PE_C0_SMAP20_CL50_300.bed SMAP haplotype-sites -------------------- Create and navigate to a new output directory to run **SMAP haplotype-sites** on the BAM files with mapped GBS reads of a set of **individuals**, its BED file from **SMAP delineate**, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. `SAMtools `_, `BEDtools `_, `Freebayes `_, or `GATK `_ for individuals):: mkdir -p $download_dir/Real_data/SMAP_haplotype_sites/output/ind cd $download_dir/Real_data/SMAP_haplotype_sites/output/ind smap haplotype-sites $download_dir/Real_data/SMAP_delineate/input/ind/ $download_dir/Real_data/SMAP_delineate/output/ind/final_stack_positions_48_ind_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_haplotype_sites/input/48_ind_GBS-PE.vcf --out haplotypes_48_ind_GBS-PE -mapping_orientation ignore --discrete_calls dosage --frequency_interval_bounds diploid --dosage_filter 2 --plot all --plot_type pdf -partial include --min_distinct_haplotypes 2 --min_read_count 10 --min_haplotype_frequency 5 --processes 8 Create and navigate to a new output directory to run **SMAP haplotype-sites** on the BAM files with mapped GBS reads of a set of **pool samples**, its BED file from **SMAP delineate**, and a VCF file with SNP calls (see for third-party SNP calling software: e.g. `SNAPE-pooled `_ for Pool-Seq data):: mkdir -p $download_dir/Real_data/SMAP_haplotype_sites/output/pools cd $download_dir/Real_data/SMAP_delineate/output/pools/ smap haplotype-sites $download_dir/Real_data/SMAP_delineate/input/pools/ $download_dir/Real_data/SMAP_delineate/output/pools/final_stack_positions_16_pools_GBS-PE_C0_SMAP20_CL50_300.bed $download_dir/Real_data/SMAP_haplotype_sites/input/48_ind_GBS-PE.vcf --out haplotypes_16_pools_GBS-PE -mapping_orientation ignore --plot all --plot_type pdf --mask_frequency 1 --undefined_representation "" -partial include --min_distinct_haplotypes 2 --min_read_count 30 --min_haplotype_frequency 5 --processes 8 Deactivate your virtual environment:: deactivate ---- ############################################### Analysis of real CRISPR/Cas genome editing data ############################################### Activate your virtual environment:: source .venv/bin/activate Define the path to the download directory:: download_dir=$/PATH/TO/DOWNLOAD_DIR/ or (from the download dir itself): download_dir=$(pwd) SMAP haplotype-window --------------------- Create and navigate to a new output directory to run **SMAP haplotype-window** on a set of FASTQ files with HiPlex reads, their mapped BAM files, its GFF file with border positions and a gRNA FASTA file from **SMAP design**:: mkdir -p $download_dir/Real_data/SMAP_haplotype_window/output cd $download_dir/Real_data/SMAP_haplotype_window/output smap haplotype-window $download_dir/Real_data/SMAP_haplotype_window/input/reference.fasta $download_dir/Real_data/SMAP_haplotype_window/input/borders.gff $download_dir/Real_data/SMAP_haplotype_window/input/ $download_dir/Real_data/SMAP_haplotype_window/input/ --mask_frequency 2 --undefined_representation "" --min_read_count 30 --min_haplotype_frequency 5 --processes 8 SMAP effect-prediction ---------------------- Create and navigate to a new output directory to run **SMAP effect-prediction** on: the haplotype frequency table obtained with **SMAP haplotype-window**, a FASTA file with reference gene sequences, a GFF with associated gene feature positions, a GFF file with border positions, and the gRNA GFF file from **SMAP design**:: mkdir -p $download_dir/Real_data/SMAP_effect_prediction/output cd $download_dir/Real_data/SMAP_SMAP_effect_prediction/output smap effect-prediction $download_dir/Real_data/SMAP_effect_prediction/input/haplotype_frequency.tsv $download_dir/Real_data/SMAP_effect_prediction/input/genome.fasta $download_dir/Real_data/SMAP_effect_prediction/input/borders.gff -a $download_dir/Real_data/SMAP_effect_prediction/input/gene_features.gff -u $download_dir/Real_data/SMAP_effect_prediction/input/guides.gff -p CAS9 -s 15 -r 20 -e dosage -i diploid -t 70 Deactivate your virtual environment:: deactivate