How It Works
get_aligned_pairs function, in which lower case nucleotides denote "different from the reference”.CALL TYPE |
CLASSES |
|---|---|
. |
absence of read mapping |
0 |
presence of the reference nucleotide |
1 |
presence of an alternative nucleotide (any nucleotide different from the reference) |
- |
presence of a gap in the alignment |
These calls are concatenated into a haplotype string of '.01-’s. For each discovered haplotype in the data, the total number of corresponding reads is counted per sample. Next, the haplotype counts of all samples are integrated into one master table, and expressed as relative haplotype frequency per locus per sample. Haplotypes with low frequency across all samples are removed to control for noise. The final table with haplotype frequencies per locus per sample is the end point for analysis of Pool-Seq data. Using the option --discrete_calls, SMAP haplotype-sites transforms the haplotype frequency table into discrete haplotype calls for individuals.
Three modes may be chosen for discrete haplotype calling in individuals:
CALL TYPE |
CLASSES |
|---|---|
dosage calls in diploids |
0, 1, 2 |
dosage calls in triploids |
0, 1, 2, 3 |
dosage calls in tetraploids |
0, 1, 2, 3, 4 |
dosage calls in pentaploids |
0, 1, 2, 3, 4, 5 |
dosage calls in pentaploids |
0, 1, 2, 3, 4, 5, 6 |
dominant calls |
0, 1 |
In the following sections, identification and quantification of haplotypes is illustrated on single-end GBS read data of a set of 8 diploid individuals at two partially overlapping loci. The content of the three example input files (BED, VCF, BAM) at this locus will be used to demonstrate the subsequent steps of SMAP haplotype-sites.