These are all options for controlling which sequences in the background dataset get pulled out into the catchment areas around the query sequences.
These options control the SNP distance around the query to define the size of the catchment area
Arguments:
-snpd/--snp-distance:
The radius around the query in terms of number of SNPs (i.e. 2 would give two SNPs upstream, downstream and to either side of the query
Default: 2
--snp-distance-up
Number of SNPs upstream (i.e closer to the reference sequence) to include
Default:
2
--snp-distance-down
Number of SNPs downstream (i.e further away from the reference sequence) to include
Default:
2
--snp-distance-side
Number of SNPs to either side (i.e the same level of difference to the reference sequence) to include
Default:
2
Defines the maximum number of background sequences in a single catchment. Catchments with more background sequences than this value will be downsampled before tree building. Downsampling strategy can be configured with `--downsample`.
Default:100
Usage:
-cs/--catchment-background-size {$NUMBER}
There a set of modes to downsample the catchments by. This can be useful for focussing in on the queries you are interested in, either by enriching import local context or by reducing detail in unncessary global contexts.
General usage:
--downsample mode={$MODE} {$MODE_SPECIFIC_OPTIONS}
Default:random
Modes:
random
This will randomly downsample non-query sequences in the catchment.
enrich
Indicate a metadata field to enrich by, ie preferentially keep sequences with these metadata values in the downsample. This could be for example samples between date ranges or locations that are of interest. If using date ranges to enrich for, the dates must be separated by a colon.
Default:10
Usage: --downsample mode=enrich factor={$NUMBER} {$COLUMN_NAME}={$VALUES}
normalise
Choose a metadata field to normalise across, so that sequences with different values of the field are represented evenly.
Usage: --downsample mode=normalise ${METADATA_FIELD}