Catchment options

These are all options for controlling which sequences in the background dataset get pulled out into the catchment areas around the query sequences.

SNP distances

These options control the SNP distance around the query to define the size of the catchment area


Arguments:


-snpd/--snp-distance:
The radius around the query in terms of number of SNPs (i.e. 2 would give two SNPs upstream, downstream and to either side of the query


Default: 2

--snp-distance-up
Number of SNPs upstream (i.e closer to the reference sequence) to include


Default:
2

--snp-distance-down
Number of SNPs downstream (i.e further away from the reference sequence) to include


Default:
2

--snp-distance-side
Number of SNPs to either side (i.e the same level of difference to the reference sequence) to include


Default:
2

Catchment background size

Defines the maximum number of background sequences in a single catchment. Catchments with more background sequences than this value will be downsampled before tree building. Downsampling strategy can be configured with `--downsample`.


Default:100


Usage:
-cs/--catchment-background-size {$NUMBER}

Downsampling

There a set of modes to downsample the catchments by. This can be useful for focussing in on the queries you are interested in, either by enriching import local context or by reducing detail in unncessary global contexts.


General usage:
--downsample mode={$MODE} {$MODE_SPECIFIC_OPTIONS}

Default:random

Modes:


random


This will randomly downsample non-query sequences in the catchment.

enrich


Indicate a metadata field to enrich by, ie preferentially keep sequences with these metadata values in the downsample. This could be for example samples between date ranges or locations that are of interest. If using date ranges to enrich for, the dates must be separated by a colon.

Default:10

Usage: --downsample mode=enrich factor={$NUMBER} {$COLUMN_NAME}={$VALUES}

normalise


Choose a metadata field to normalise across, so that sequences with different values of the field are represented evenly.

Usage: --downsample mode=normalise ${METADATA_FIELD}

Next: Report configuration options