nf-core/createtaxdb

Parallelised and automated construction of metagenomic classifier databases of different tools

databasedatabase-buildermetagenomic-profilingmetagenomicsprofilingtaxonomic-profiling

This is the development version of the pipeline.

Launch development version https://github.com/nf-core/createtaxdb

Define where the pipeline should find input data and save output data.

Path to comma-separated file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.csv$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

MultiQC report title. Printed as page header, used for filename if not otherwise specified.

type: string

Specify name that resulting databases will be prefixed with.

required

type: string

NCBI-style four-column accession to taxonomy ID map file

type: string

Two column protein sequence accession ID to taxonomy map file.

type: string

Two column nucleotide sequence accession ID to taxonomy map file.

type: string

Path to NCBI-style taxonomy node dmp file.

type: string

Path to NCBI-style taxonomy names dmp file.

type: string

Path to NCBI or GTDB genome sizes file

type: string

Path to MEGAN6/MALT mapping db file

type: string

Specify the type of MALT mapdb provided, based on the corresponding MALT flag.

type: string

Save concatenated input FASTAs

type: boolean

Turn on extending of Kraken2 database to include Bracken files. Requires nucleotide FASTA File input.

type: boolean

Specify parameters being given to bracken build. Must be wrapped in single and double quotes: —bracken_build_options ”‘—your_param’”

type: string

Turn on building of Centrifuge database. Requires nucleotide FASTA file input.

type: boolean

Specify parameters being given to centrifuge-build. Must be wrapped in single and double quotes: —centrifuge_build_options ”‘—your_param’”

type: string

Turn on building of DIAMOND database. Requires amino-acid FASTA file input.

type: boolean

Specify parameters being given to diamond makedb. Must be wrapped in single and double quotes: —diamond_build_options ”‘—your_param’”

type: string

Turn on building of ganon database. Requires nucleotide FASTA file input.

type: boolean

Specify parameters being given to ganon buildcustom. Must be wrapped in single and double quotes: —ganon_build_options ”‘—your_param’”

type: string

Turn on building of Kaiju database. Requires amino-acid FASTA file input.

type: boolean

Specify parameters being given to kaiju-mkbwt. Must be wrapped in single and double quotes: —kaiju_build_options ”‘—your_param’”

type: string

Save intermediate files otherwise not required for downstream classification.

type: boolean

Turn on building of KMCP database. Requires nucleotide FASTA file input.

type: boolean

Specify parameters being given to kmcp compute. Must be wrapped in single and double quotes: —kmcp_compute_options ”‘—your_param’”

type: string

Specify parameters being given to kmcp index. Must be wrapped in single and double quotes: —kmcp_index_options ”‘—your_param’”

type: string

Turn on building of Kraken2 database. Requires nucleotide FASTA file input.

type: boolean

Specify parameters being given to kraken2 build. Must be wrapped in single and double quotes: —kraken2_build_options ”‘—your_param’”

type: string

Retain intermediate Kraken2 build files for inspection.

type: boolean

Turn on building of KrakenUniq database. Requires nucleotide FASTA file input.

type: boolean

Specify parameters being given to krakenuniq build. Must be wrapped in single and double quotes: —krakenuniq_build_options ”‘—your_param’”

type: string

Save intermediate files otherwise not required for downstream classification.

type: boolean

Turn on building of MALT database. Requires nucleotide FASTA file input.

type: boolean

Specify parameters given to malt-build. Must include —sequenceType DNA or Protein and be wrapped in double and single quotes: —malt_build_options ”‘—sequenceType DNA —your_param’”

type: string

default: --sequenceType DNA

Whether to build a sourmash reference from the provided nucleotide sequences.

type: boolean

Specify parameters given to sourmash sketch dna. Must start with sourmash sketch dna’s ‘—param-string’.

type: string

default: --param-string \'scaled=1000,k=31,noabund\

Whether to build a sourmash reference from the provided amino acid sequences.

type: boolean

Specify parameters given to sourmash sketch protein. Must start with sourmash sketch protein’s ‘—param-string’.

type: string

default: --param-string \'scaled=200,k=10,noabund\

Sourmash can perform the main build step in parallel batches. Set the size of the batches.

type: integer

default: 100

Options for generating input samplesheets for complementary downstream pipelines.

Generate .tar.gz archived versions of all databases

type: boolean

Turn on generation of samplesheets for downstream pipelines.

type: boolean

Specify a comma separated string in quotes to specify which pipeline to generate a samplesheet for.

type: string

pattern: ^(taxprofiler)(?:,(taxprofiler)){0,1}

Specify which type of database to list paths of in the generated pipeline samplesheet.

type: string

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

How many files to unzip in parallel in a single job.

required

type: integer

default: 10000

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

File size limit when attaching MultiQC reports to summary emails.

hidden

type: string

default: 25.MB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Custom config file to supply to MultiQC.

hidden

type: string

Custom logo file to supply to MultiQC. File name must also be set in the MultiQC config file

hidden

type: string

Custom MultiQC yaml file containing HTML including a methods description.

type: string

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Display the help message.

type: boolean,string

Display the full detailed help message.

type: boolean

Display hidden parameters in the help message (only works when —help or —help_full are provided).

type: boolean

On this page

nf-core/createtaxdb

Input/output options

Database Building Options

Downstream pipeline samplesheet generation options

Institutional config options

Generic options