Output

Back to pangolin documentation home page.

pangolin outputs a csv file with taxon name and lineage assigned, one line corresponding to each sequence in the fasta file provided. The following descriptions relate to pangolin 4.0 onwards.

CSV column headers

taxon

The name of an input query sequence.

lineage

The most likely lineage assigned to a given sequence based on the inference engine used and the SARS-CoV-2 diversity designated. This assignment may be is sensitive to missing data at key sites.

conflict

In the pangoLEARN model, a given sequence gets assigned to the most likely category based on known diversity. If a sequence can fit into more than one category, the conflict score will be greater than 0 and reflect the number of categories the sequence could fit into. If the conflict score is 0, this means that within the current decision tree there is only one category that the sequence could be assigned to.

ambiguity_score

This score is a function of the quantity of missing data in a sequence. It represents the proportion of relevant sites in a sequnece which were imputed to the reference values. A score of 1 indicates that no sites were imputed, while a score of 0 indicates that more sites were imputed than were not imputed. This score only includes sites which are used by the decision tree to classify a sequence.

scorpio_call

If a query is assigned a constellation by scorpio this call is output in this column. The full set of constellations searched by default can be found at the constellations repository.

scorpio_support

The support score is the proportion of defining variants which have the alternative allele in the sequence.

scorpio_conflict

The conflict score is the proportion of defining variants which have the reference allele in the sequence. Ambiguous/other non-ref/alt bases at each of the variant positions contribute only to the denominators of these scores

scorpio_notes

A notes column specific to the scorpio output.

version

A version number that represents both pangolin-data version number, which as of pangolin 4.0 corresponds to the pango-designation version used to prepare the inference files. For example:

pangolin_version

The version of pangolin software running.

scorpio_version

The version of the scorpio software installed.

constellation_version

The version of constellations that scorpio has used to curate the lineage assignment.

is_designated

A boolean (True/False) column indicating whether that particular sequence has been offically designated a lineage.

qc_status

Indicates whether the sequence passed the QC thresholds for minimum length and maximum N content.

qc_notes

Notes specific to the QC checks run on the sequences.

note

If any conflicts from the decision tree, this field will output the alternative assignments. If the sequence failed QC this field will describe why. If the sequence met the SNP thresholds for scorpio to call a constellation, it’ll describe the exact SNP counts of Alt, Ref and Amb (Alternative, reference and ambiguous) alleles for that call.


Example output


taxon lineage conflict ambiguity_
score
scorpio_
call
scorpio_
support
scorpio_
conflict
scorpio_
notes
version pangolin_
version
scorpio_
version
constellation
_version
is_designated qc_status qc_notes note
Virus1 B.1.617.1 0.0 B.1.617.1-like 1.0 0.0 scorpio call: Alt alleles 11; Ref alleles 0; Amb alleles 0; Oth alleles 0 PUSHER-1.2.101 4.0 0.3.16 v0.1.3 False pass Ambiguous_content:0.02
Virus2 B.1.1.7 0.0 Alpha (B.1.1.7-like) 0.91 0.04 scorpio call: Alt alleles 21; Ref alleles 1; Amb alleles 1; Oth alleles 0 PUSHER-1.2.101 4.0 0.3.16 v0.1.3 False pass Ambiguous_content:0.02
Virus3 A PUSHER-1.2.101 4.0 0.3.16 v0.1.3 False pass Ambiguous_content:0.02
Virus4 B 0.5 PANGO-1.2.101 4.0 0.3.16 v0.1.3 True pass Ambiguous_content:0.02 Assigned from designation hash.
Virus5 B.1.314 0.0 PANGO-1.2.101 4.0 0.3.16 v0.1.3 True pass Ambiguous_content:0.02 Assigned from designation hash.
Virus6 Unassigned PUSHER-1.2.101 4.0 0.3.16 v0.1.3 False fail Ambiguous_content:0.9
Virus7 Unassigned PUSHER-1.2.101 4.0 0.3.16 v0.1.3 False fail Ambiguous_content:0.98
Virus8 Unassigned PUSHER-1.2.101 4.0 0.3.16 v0.1.3 False fail Failed to map