16S Pipeline Pre-Built Database
The 16S pipeline depends on a k-mer database of 16S reference sequences. The user can provide a FASTA file with reference sequences for a database to be built on the fly during the analysis. Or, the pre-built Refseq-RDP-v1 database can be used.
Refseq-RDP-v1 Description
The Refseq-RDP-v1 database contains 14,676 bacterial & 660 archaeal full-length 16S rRNA gene sequences. It was compiled in 14/05/2018 from predominantly the NCBI RefSeq 16S rRNA database and was supplemented with extra sequences from the RDP database. The same 16S sequence content was used to build the default 16S database provided with the 16S Metagenomics v1.1.3 app (RefSeq RDP 16S v3 May 2018 DADA2 32 bp).
Citation: Ali Alishum. (2019). DADA2 formatted 16S rRNA gene sequences for both bacteria & archaea (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.2541239
Downloading the Database
The database is free to download via a shell script using the instructions below.
(1) Create a directory that will be dedicated to storing the database files
It is recommended that the directory be on a disk with at least 1 GB of free space. The path to this directory will be used for the -d
parameter when the download script is run in subsequent steps. In the example commands below, the directory is named "databases".
mkdir databases
(2) Download the shell script
Download the shell script and make it executable:
wget -O explify-dbs.sh https://illumina-explify-databases.s3.us-east-1.amazonaws.com/explify-dbs.sh
chmod +x explify-dbs.sh
(3) View the database versions that are available for download
./explify-dbs.sh search -d databases/ -p Refseq-RDP-v1
The command will print the available databases, e.g.:
1 database(s) found meeting those criteria:
- Refseq-RDP-v1-1.0.1
(4) Download the database
./explify-dbs.sh download -d databases/ -p Refseq-RDP-v1 -v 1.0.1 -n 20
The
-v
argument is the database version observed in step 3The
-n
argument is the number of CPUs that can be used to download the files (defaults to 1)
Additional notes:
In this example, after the Refseq-RDP-v1-1.0.1 files are downloaded, additional required files will be downloaded to a subdirectory named "common"
After the files are downloaded, their checksums will be automatically checked
(5) Use the database in an analysis
The database files will have been downloaded to databases/Refseq-RDP-v1/1.0.1/
. This path is then used with the --16s-db-dir
command line argument.
Verifying the Downloaded Files
The checksums of the individual database files are verified automatically as part of the download
command, but they can also be verified at any later time with the check
command:
$ ./explify-dbs.sh check -d databases/ -p Refseq-RDP-v1 -v 1.0.1 -n 20
Last updated
Was this helpful?