.. raw:: html
.. role:: purple
.. raw:: html
.. role:: white
.. raw:: html
.. role:: green
.. role:: blue
.. role:: red
#############
Scope & Usage
#############
Scope
-----
:purple:`Comparisons across data sets, shared and unique loci`
**SMAP chromplot** analyzes the overlap between haplotypes :ref:`SMAP delineate `.
----
Integration in the SMAP workflow
--------------------------------
.. image:: ../images/chromplot/SMAP_global_scheme_home_2026.png
IMAGE NEEDS TO BE UPDATED
Required input
--------------
.. tabs::
.. tab:: haplotype call table
The FASTA file containing the reference sequence. Typically, whole genome reference sequences are used for Shotgun sequencing data, while a reference consisting of selected candidate genes may be created by **SMAP target-selection** for HiPlex data.
.. tab:: BED with locus coordinates
| The GFF file with the coordinates of pairs of borders that enclose a window to define the locus positions, created with **SMAP sliding-frames** for Shotgun data or **SMAP design** for HiPlex data.
| The `GFF `_ file describes the position of the border regions on the reference sequence in 9 columns. **SMAP haplotype-window** expects two borders that together enclose a window, which are paired based on the \'NAME=\' field in the 9th column. The file does not need to contain a header. These fields need to be specified:
| 1. Name of the sequence in the reference that contains the Window.
| 2. Source of the feature. [SMAP haplotype-window].
| 3. Feature type. Because in SMAP haplotype-window pairs of borders define windows, two feature types are used: border_upstream and border_downstream. Each line in the GFF is one of those borders. Borders always come in pairs.
| 4. The start coordinate of the border region [in the 1-based GFF coordinate system].
| 5. The end coordinate of the border region [in the 1-based GFF coordinate system, value must always be higher than column 4].
| 6. Score. Irrelevant for SMAP haplotype-window [.].
| 7. Orientation of the border [always +].
| 8. Phase. Irrelevant for SMAP haplotype-window [.].
| 9. Attributes of the border, the field \'NAME=\' is required. This field is used to pair borders (by exact \'NAME=\' matching), and define the corresponding window regions. The field Name must be unique for each window and will be used to name loci in the haplotype frequency tables.
| Depending on the type of data (HiPlex or Shotgun Seq), a specific GFF file must be created to define pairs of borders enclosing windows.
.. tabs::
.. tab:: HiPlex / primer binding sites
| For HiPlex data it is advised to use the 8-10 nucleotides on the 3' of the primer binding site, where they flank the window (to extract the sequence read region *inbetween* the primers).
.. csv-table::
:file: ../tables/window/example_HiPlex_gff.csv
:header-rows: 0
.. tab:: Shotgun Sequencing / sliding windows
| Shotgun Sequencing data may be analysed with a set of sliding windows, with a customisable window size (here 50), step size (here 20), and border length (here 10).
.. csv-table::
:file: ../tables/window/example_Shotgun_gff.csv
:header-rows: 0
.. tab:: reference samples
A set of FASTQ files with preprocessed reads that need to be haplotyped. Any number of samples may be given and will be processed in parallel.
All files per sample are matched by extension: .fq / .bam / .bam.bai. Therefore, the FASTQ files must have matching basenames compared to the BAM files: sample1.fq combined with sample1.bam and sample1.bam.bai. Optionally, FASTQ files may be gzipped: sample1.fq.gz.
.. tab:: optional: selected sample names
Name of a tab-delimited text file in the input directory defining the order of the (new) sample names in the barplot: first column = old names, second column (optional) = new names
The default is no sample list, the order of samples in the bar plot equals their order in the haplotype table.
.. tab:: optional: selected locus names
Optional: a FASTA file containing the gRNA sequences, created by **SMAP design**, in case CRISPR was performed by stable transformation with a CRISPR/gRNA delivery vector, see also :ref:`CRISPR `.
.. _SMAPchromplotcommands:
Commands & options
------------------
::
-h, --help show this help message and exit
-v, --version show program's version number and exit
-t TABLE, --table TABLE
Name of the haplotypes table retrieved from SMAP haplotype-sites or SMAP haplotype-windows in the input directory.
-b BED, --bed BED BED file containing the coordinates of each contig in the reference genome sequence. The BED file must be stored in the input directory.
-r REFERENCE_SAMPLES, --reference_samples REFERENCE_SAMPLES
Name of a tab-delimited text file in the input directory listing the (new) IDs of samples used as references in the plot: first column = sample name, second column (optional):
colour ID (default = no list with reference samples IDs is not provided).
-o OUTPUT, --output OUTPUT
Output file name (default = chromplot).
-n SAMPLES, --samples SAMPLES
Name of a tab-delimited text file in the input directory defining the order of the (new) sample names in the barplot: first column = old names, second column (optional) = new names
(default = no sample list, the order of samples in the bar plot equals their order in the haplotype table).
-l LOCI, --loci LOCI Name of a tab-delimited text file in the input directory containing a one-column list of locus IDs formatted as in the haplotypes table (default = no list provided).
--ploidy PLOIDY Integer defining the (highest) ploidy level of the samples in the haplotypes table (default = 2, diploid).
--plot_format {pdf,png,svg,jpg,jpeg,tif,tiff}
File format of plots (default = pdf).
----
.. _SMAPchromplotexcommands:
Example commands
----------------
basic usage:
.. code-block:: bash
smap chromplot -t -b -r -o
.. _SMAPcompareoutput:
Output
------
.. tabs::
.. tab:: Graphical output
| **SMAP chromplot** provides a graphical output that colors loci according to shared alleles with given reference samples.
.. tab:: Tabular output
| **SMAP chromplot** can create a tabular output.