Usage

Back to pangolin documentation home page.

Simple usage

  1. Activate the environment with conda activate pangolin
  2. Run pangolin <query> where <query> is the name of your input (fasta) file

Default behaviour

As of pangolin 4.0, pangolin will run lineage assignment by default in accurate (UShER) mode.

Note: This will be significantly slower for larger datasets, which we still recommend be run in fast mode or with a lot of threads.

To run in fast mode (e.g. for larger datasets), specify --analysis-mode fast on the command line and this will run the pangoLEARN model inference.

Analysis mode options

Pangolin includes multiple analysis engines: UShER and pangoLEARN.
Scorpio is used in conjunction with UShER/ pangoLEARN to curate variant of concern (VOC)-related lineage calls.

UShER mode

In pangolin 4.0, UShER is the default and is selected using option "usher" or "accurate". pangolin runs a parsimony-based lineage assignment using UShER as the inference engine.

Run pangolin <query> where <query> is the name of your input (fasta) file

pangoLEARN mode

pangoLEARN can alternatively be selected using "pangolearn" or "fast". As of v4.0, pangoLEARN mode uses a random forest machine learning approach as the inference engine.

Run pangolin --analysis-mode fast <query> where <query> is the name of your input (fasta) file

scorpio only mode

Finally, it is possible to skip the UShER/ pangoLEARN step by selecting "scorpio" mode, but in this case only VOC-related lineages will be assigned. The output version number (e.g. `SCORPIO_v0.1.4`) corresponds to the version of constellations used in the scorpio assignment.

Output multiple sequence alignment

In the process of lineage assignment, pangolin creates an alignment using minimap2 to map against an early, anonymised reference SARS-CoV-2 sequence and then using gofasta to generate a fasta file from that mapping with the non-coding regions masked out with N's.

For convenience (I know I certainly find it very useful for quickly generating a SARS-CoV-2 alignment), pangolin has a flag that will output this alignment in addition to the lineage report instead of writing it to temp. The exact parameters can be found in the source code here.

By default the output alignment file is called alignment.fasta, but as of pangolin 4.0 you can now specify the name of this file with the --alignment-file flag.

  1. Activate the environment with conda activate pangolin
  2. Run pangolin --alignment <query> where <query> is the name of your input file

Full usage options

  usage: pangolin  [options]

    pangolin: Phylogenetic Assignment of Named Global Outbreak LINeages
    
    optional arguments:
      -h, --help            show this help message and exit
    
    Input-Output options:
      query                 Query fasta file of sequences to analyse.
      -o OUTDIR, 
      --outdir OUTDIR
                            Output directory. Default: current working directory
      --outfile OUTFILE     Optional output file name. Default: lineage_report.csv
      --tempdir TEMPDIR     Specify where you want the temp stuff to go. Default: $TMPDIR
      --no-temp             Output all intermediate files, for dev purposes.
      --alignment           Output multiple sequence alignment.
      --alignment-file ALIGNMENT_FILE
                            Multiple sequence alignment file name.
    
    Analysis options:
      --analysis-mode ANALYSIS_MODE
                            Specify which inference engine to use. 
                            Options: accurate (UShER), fast (pangoLEARN), pangolearn, usher. 
                            Default: UShER inference.
      --skip-designation-cache
                            Developer option - do not use designation cache to assign lineages.
      --max-ambig MAXAMBIG  Maximum proportion of Ns allowed for pangolin to attempt assignment. 
                            Default: 0.3
      --min-length MINLEN   Minimum query length allowed for pangolin to attempt assignment. 
                            Default: 25000
    
    Data options:
      --update              Automatically updates to latest release of pangolin, pangolin-data, scorpio 
                            and constellations, then exits.
      --update-data         Automatically updates to latest release of constellations and pangolin-data, 
                            including the pangoLEARN model, UShER tree file and alias file then
                            exits.
      -d DATADIR, 
      --datadir DATADIR
                            Data directory minimally containing the pangoLEARN model, header files, 
                            UShER tree and alias file. Default: Installed pangolin-data package.
      --usher-tree USHER_PROTOBUF
                            UShER Mutation Annotated Tree protobuf file to use instead of --usher 
                            default from pangolin-data repository or --datadir.
    
    Misc options:
      --aliases             Print Pango alias_key.json and exit.
      -v, --version         show program's version number and exit
      -pv, --pangolin-data-version
                            show version number of pangolin data files (UShER tree and pangoLEARN model files) 
                            and exit.
      --all-versions        Print all tool, dependency, and data versions then exit.
      --verbose             Print lots of stuff to screen
      -t THREADS, --threads THREADS
                            Number of threads