Usage

Back to pangolin documentation home page.

Simple usage

Activate the environment with conda activate pangolin
Run pangolin <query> where <query> is the name of your input (fasta) file

Default behaviour

As of pangolin 4.0, pangolin will run lineage assignment by default in accurate (UShER) mode.

Note: This will be significantly slower for larger datasets, which we still recommend be run in fast mode or with a lot of threads.

To run in fast mode (e.g. for larger datasets), specify --analysis-mode fast on the command line and this will run the pangoLEARN model inference.

Analysis mode options

Pangolin includes multiple analysis engines: UShER and pangoLEARN.
Scorpio is used in conjunction with UShER/ pangoLEARN to curate variant of concern (VOC)-related lineage calls.

UShER mode

In pangolin 4.0, UShER is the default and is selected using option "usher" or "accurate". pangolin runs a parsimony-based lineage assignment using UShER as the inference engine.

Run pangolin <query> where <query> is the name of your input (fasta) file

pangoLEARN mode

pangoLEARN can alternatively be selected using "pangolearn" or "fast". As of v4.0, pangoLEARN mode uses a random forest machine learning approach as the inference engine.

Run pangolin --analysis-mode fast <query> where <query> is the name of your input (fasta) file

scorpio only mode

Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio" mode, but in this case only VOC-related lineages will be assigned. The output version number (e.g. `SCORPIO_v0.1.4`) corresponds to the version of constellations used in the scorpio assignment.

Output multiple sequence alignment

In the process of lineage assignment, pangolin creates an alignment using minimap2 to map against an early, anonymised reference SARS-CoV-2 sequence and then using gofasta to generate a fasta file from that mapping with the non-coding regions masked out with N's.

For convenience (I know I certainly find it very useful for quickly generating a SARS-CoV-2 alignment), pangolin has a flag that will output this alignment in addition to the lineage report instead of writing it to temp. The exact parameters can be found in the source code here.

By default the output alignment file is called alignment.fasta, but as of pangolin 4.0 you can now specify the name of this file with the --alignment-file flag.

Activate the environment with conda activate pangolin
Run pangolin --alignment <query> where <query> is the name of your input file

Full usage options

usage: pangolin [options]

pangolin: Phylogenetic Assignment of Named Global Outbreak LINeages

optional arguments:
-h, --help show this help message and exit

Input-Output options:
query Query fasta file of sequences to analyse.
-o OUTDIR,
--outdir OUTDIR
Output directory. Default: current working directory
--outfile OUTFILE Optional output file name. Default: lineage_report.csv
--tempdir TEMPDIR Specify where you want the temp stuff to go. Default: $TMPDIR
--no-temp Output all intermediate files, for dev purposes.
--alignment Output multiple sequence alignment.
--alignment-file ALIGNMENT_FILE
Multiple sequence alignment file name.

Analysis options:
--analysis-mode ANALYSIS_MODE
Specify which inference engine to use.
Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher.
Default: UShER inference.
--skip-designation-cache
Developer option - do not use designation cache to assign lineages.
--max-ambig MAXAMBIG Maximum proportion of Ns allowed for pangolin to attempt assignment.
Default: 0.3
--min-length MINLEN Minimum query length allowed for pangolin to attempt assignment.
Default: 25000

Data options:
--update Automatically updates to latest release of pangolin, pangolin-data, scorpio
and constellations, then exits.
--update-data Automatically updates to latest release of constellations and pangolin-data,
including the pangoLEARN model, UShER tree file and alias file then
exits.
-d DATADIR,
--datadir DATADIR
Data directory minimally containing the pangoLEARN model, header files,
UShER tree and alias file. Default: Installed pangolin-data package.
--usher-tree USHER_PROTOBUF
UShER Mutation Annotated Tree protobuf file to use instead of --usher
default from pangolin-data repository or --datadir.

Misc options:
--aliases Print Pango alias_key.json and exit.
-v, --version show program's version number and exit
-pv, --pangolin-data-version
show version number of pangolin data files (UShER tree and pangoLEARN model files)
and exit.
--all-versions Print all tool, dependency, and data versions then exit.
--verbose Print lots of stuff to screen
-t THREADS, --threads THREADS
Number of threads