pangolin outputs a csv file with taxon name and lineage assigned, one line corresponding to each sequence in the fasta file provided. The following descriptions relate to pangolin 4.0 onwards.
The name of an input query sequence.
The most likely lineage assigned to a given sequence based on the inference engine used and the SARS-CoV-2 diversity designated. This assignment may be is sensitive to missing data at key sites.
In the pangoLEARN model, a given sequence gets assigned to the most likely category based on known diversity. If a sequence can fit into more than one category, the conflict score will be greater than 0 and reflect the number of categories the sequence could fit into. If the conflict score is 0, this means that within the current decision tree there is only one category that the sequence could be assigned to.
This score is a function of the quantity of missing data in a sequence. It represents the proportion of relevant sites in a sequnece which were imputed to the reference values. A score of 1 indicates that no sites were imputed, while a score of 0 indicates that more sites were imputed than were not imputed. This score only includes sites which are used by the decision tree to classify a sequence.
If a query is assigned a constellation by scorpio this call is output in this column. The full set of constellations searched by default can be found at the constellations repository.
The support score is the proportion of defining variants which have the alternative allele in the sequence.
The conflict score is the proportion of defining variants which have the reference allele in the sequence. Ambiguous/other non-ref/alt bases at each of the variant positions contribute only to the denominators of these scores
A notes column specific to the scorpio output.
A version number that represents both pangolin-data version number, which as of pangolin 4.0 corresponds to the pango-designation version used to prepare the inference files. For example:
The version of pangolin software running.
The version of the scorpio software installed.
The version of constellations that scorpio has used to curate the lineage assignment.
A boolean (True/False) column indicating whether that particular sequence has been offically designated a lineage.
Indicates whether the sequence passed the QC thresholds for minimum length and maximum N content.
Notes specific to the QC checks run on the sequences.
If any conflicts from the decision tree, this field will output the alternative assignments. If the sequence failed QC this field will describe why. If the sequence met the SNP thresholds for scorpio to call a constellation, it’ll describe the exact SNP counts of Alt, Ref and Amb (Alternative, reference and ambiguous) alleles for that call.
taxon | lineage | conflict | ambiguity_ score |
scorpio_ call |
scorpio_ support |
scorpio_ conflict |
scorpio_ notes |
version | pangolin_ version |
scorpio_ version |
constellation _version |
is_designated | qc_status | qc_notes | note |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Virus1 | B.1.617.1 | 0.0 | B.1.617.1-like | 1.0 | 0.0 | scorpio call: Alt alleles 11; Ref alleles 0; Amb alleles 0; Oth alleles 0 | PUSHER-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | False | pass | Ambiguous_content:0.02 | ||
Virus2 | B.1.1.7 | 0.0 | Alpha (B.1.1.7-like) | 0.91 | 0.04 | scorpio call: Alt alleles 21; Ref alleles 1; Amb alleles 1; Oth alleles 0 | PUSHER-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | False | pass | Ambiguous_content:0.02 | ||
Virus3 | A | PUSHER-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | False | pass | Ambiguous_content:0.02 | |||||||
Virus4 | B | 0.5 | PANGO-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | True | pass | Ambiguous_content:0.02 | Assigned from designation hash. | |||||
Virus5 | B.1.314 | 0.0 | PANGO-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | True | pass | Ambiguous_content:0.02 | Assigned from designation hash. | |||||
Virus6 | Unassigned | PUSHER-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | False | fail | Ambiguous_content:0.9 | |||||||
Virus7 | Unassigned | PUSHER-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | False | fail | Ambiguous_content:0.98 | |||||||
Virus8 | Unassigned | PUSHER-1.2.101 | 4.0 | 0.3.16 | v0.1.3 | False | fail | Failed to map |