16S Custom Database
Instead of using the pre-built 16S database, a set of user-defined reference sequences can be provided to the pipeline. It will use those sequences to build a temporary database and run the analysis.
The reference sequences should be provided as a FASTA file. It can contain up to 500 million basepairs of reference sequences, and must be specified using the exact FASTA header format defined below. In the FASTA file, the SequenceID should not contain any spaces. The SequenceID is followed by a colon and the taxonomy associated with the reference sequence. The taxonomy must have seven canonical taxonomic rank prefixes specified: k__;p__;c__;o__;f__;g__;s__. However these can all be left blank except for (k)ingdom and (s)pecies designations, which are required.
Here is an example of a FASTA header that is well-formed:
>SequenceID_001:k__Fungi;p__Glomeromycota;c__Glomeromycetes;o__Glomerales;f__Glomeraceae;g__;s__uncultured_Glomus
The FASTA file is used with the command line argument --16s-custom-references
.
Last updated
Was this helpful?