pangolin outputs a csv file with taxon name and lineage assigned, one line corresponding to each sequence in the fasta file provided. The following descriptions relate to pangolin 3.0 onwards.

CSV column headers


The name of an input query sequence. Note that spaces and commas in sequence names (not a good idea to have these characters in sequence names in general) get replaced by underscores.


The most likely lineage assigned to a given sequence based on the inference engine used and the SARS-CoV-2 diversity designated. This assignment may be is sensitive to missing data at key sites.


In the pangoLEARN decision tree model, a given sequence gets assigned to the most likely category based on known diversity. If a sequence can fit into more than one category, the conflict score will be greater than 0 and reflect the number of categories the sequence could fit into. If the conflict score is 0, this means that within the current decision tree there is only one category that the sequence could be assigned to.


This score is a function of the quantity of missing data in a sequence. It represents the proportion of relevant sites in a sequnece which were imputed to the reference values. A score of 1 indicates that no sites were imputed, while a score of 0 indicates that more sites were imputed than were not imputed. This score only includes sites which are used by the decision tree to classify a sequence.


If a query is assigned a constellation by scorpio this call is output in this column. The full set of constellations searched by default can be found at the constellations repository.


The support score is the proportion of defining variants which have the alternative allele in the sequence.


The conflict score is the proportion of defining variants which have the reference allele in the sequence. Ambiguous/other non-ref/alt bases at each of the variant positions contribute only to the denominators of these scores


A version number that represents both the pango-designation number and the inference engine used to assign the lineage. For example:


The version of pangolin software running.


The dated version of the pangoLEARN model installed.


The version of pango-designation lineages that this assignment is based on. This corresponds to the pango-designation version used to train the pangoLEARN/UShER models (and provide hashed lineage assignments) and not the pango-designation installed as a dependency (which is used only for lineage aliases).


Indicates whether the sequence passed the QC thresholds for minimum length and maximum N content.


If any conflicts from the decision tree, this field will output the alternative assignments. If the sequence failed QC this field will describe why. If the sequence met the SNP thresholds for scorpio to call a constellation, it’ll describe the exact SNP counts of Alt, Ref and Amb (Alternative, reference and ambiguous) alleles for that call.

Example output

taxon lineage conflict ambiguity_
version pangolin_
pango_version status note
Virus1 B.1.617.1 PANGO-1.2 2.4.2 2021-05-10 1.2 passed_qc Assigned using designation hash.
Virus2 B.1.1.7 0.0 1.0 B.1.1.7 0.695700 0.130400 PLEARN-1.2 2.4.2 2021-05-10 1.2 passed_qc scorpio call:
Alt alleles 16;
Ref alleles 3;
Amb alleles 4
Virus3 A PANGO-1.2 2.4.2 2021-05-10 1.2 passed_qc Assigned using designation hash.
Virus4 B 0.0 1.0 PLEARN-1.2 2.4.2 2021-05-10 1.2 passed_qc
Virus5 B.1.314 0.0 1.0 PLEARN-1.2 2.4.2 2021-05-10 1.2 passed_qc
Virus6 None PLEARN-1.2 2.4.2 2021-05-10 1.2 fail seq_len:2997
Virus7 None PLEARN-1.2 2.4.2 2021-05-10 1.2 fail N_content:0.98
Virus8 None PLEARN-1.2 2.4.2 2021-05-10 1.2 fail failed_to_map