DRAGEN
Illumina Connected Software
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • ORA Reference
  • Command Line Options
  • Interleaved Compression
  • How to use ORA input files with DRAGEN Map/Align
  • List of supported references

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.4

ORA Compression

PreviousIllumina Connected AnnotationsNextCommand Line Options

Last updated 2 days ago

Was this helpful?

DRAGEN ORA Compression is a fully lossless compression, that compresses *.fastq and *.fastq.gz files into *.fastq.ora files. DRAGEN ORA supports FASTQ generated by Illumina sequencing systems. When using the ORA format, the md5 checksum of the FASTQ content is preserved after a compression and decompression cycle to ensure a lossless compression.

For human data generated by the NovaSeq 6000, NextSeq 1000, or NextSeq 2000 sequencing systems, the compression ratio is expected to be up to 6x compared to the *.fastq.gz. The compressed file uses the *.fastq.ora extension.

Input of DRAGEN ORA Compression is *.fastq or *.fastq.gz. Input can be a single file or a list of files. A list of files can be specified on the command line, or from a *.fastq-list.csv generated by the BCL Convert BaseSpace Sequence Hub App or DRAGEN BCL convert. Input located in local storage, AWS S3 or Azure Blob storage is supported.

*.fastq.ora files are decompressed into *.fastq.gz.

Note: *.fastq.ora can be generated starting from BCL. To convert BCL into *.fastq.ora, specific commands need to be used. Follow the instructions.

Note: Decompression and ingestion of *.fastq.ora files into the DRAGEN map/align does not require a license, other operations DO require and consume Compression License quota. More information on Licensing can be found in the .

ORA Reference

To compress or decompress ORA files, you must provide the ORA reference files and specify an ORA reference directory.

Several references to compress data from different species and from different type of human data are supported. Refer to the list of supported references below.

You can download ORA reference files from the . To ensure proper management of the reference files, do not change any of the file names of the downloaded archive.

To specify an ORA reference directory, do as follows.

  1. Download the oradata-2.tar.gz (or archive relevant to your studied model) from the DRAGEN Software Support Site.

  2. Move the file to the location you would like to contain the reference directory in, and then enter the following to extract the contents.tar -xzvf oradata-2.tar.gz

  3. Set the --ora-reference command line option to the extracted /oradata folder path.

The oradata folder should follow the following structures:

When only one reference is handled:

oradata
  ├── lena_index_V2
  └── refbin

--ora-reference should still point to the parent oradata folder.

When one ore more references are handled:

oradata
├── homo_sapiens_bisulfite
│   ├── lena_index_V2
│   └── refbin
├── gallus_gallus
│   ├── lena_index_V2
│   └── refbin
└── mus_musculus
    ├── lena_index_V2
    └── refbin

Command Line Options

The following example command contains the required DRAGEN ORA compression options to compress regular human data:

dragen --enable-map-align false --ora-input <FILE> --enable-ora true --ora-reference <...> --output-directory <...>

or

dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --ora-reference <...> --output-directory <...>

The following example command contains the required DRAGEN ORA decompression options (for human and non-human data):

dragen --enable-map-align false --ora-input <FILE> --enable-ora true --ora-decompress true --ora-reference <...> --output-directory <...>

The following examples command contains the required options to compress FASTQs of a fastq-list.csv file containing multiple samples (regular human data):

When all samples must be compressed:dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --fastq-list-all-samples true --ora-reference <...> --output-directory <...>

When only specific samples must be compressed:dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --fastq-list-sample-id <sample> --ora-reference <...> --output-directory <...>

The following examples command contains the required options to achieve an interleaved compression of paired-read files from a fastq-list.csv file (regular human data) :

dragen --enable-map-align false --fastq-list <FILE .csv> --enable-ora true --ora-interleaved-compression true --ora-reference <...> --output-directory <...>

The following example command contains the required DRAGEN ORA compression options to compress non-human or specific human data, chicken data in this case:

dragen --enable-map-align false --ora-input <FILE> --enable-ora true --ora-compression-species <gallus_gallus> --ora-reference <...> --output-directory <...>

The following example command prints the file information summary of an ORA compressed file. Compression or decompression is not performed.

dragen --enable-map-align false --ora-input <FILE> --enable-ora=true --ora-print-file-info=true

The following example command compares FASTQ file checksum and decompressed FASTQ.ORA file checksum and outputs "ORA integrity check successful" if both checksums are equal or "integrity check failed" if checksums are not equal.

dragen --enable-map-align false --ora-input <FILE> --enable-ora=true --ora-reference <...> --ora-check-file-integrity=true

The following example command contains the required DRAGEN ORA compression options to print the list of available references:

dragen --enable-map-align false --enable-ora true --ora-list-species true

The following are the command line options for running DRAGEN ORA Compression and Decompression.

Option
Required
Description

--enable-map-align

Yes

Set to falseto perform compression only. In this case, only the compression license gets deducted. Set to true to perform the compression in parallel of the map/align step. In this case, the compression license AND the DRAGEN license get deducted. When set to true all the options required to process with the map/align step must be provided.

--enable-ora

Yes

Set to true to enable FASTQ file compression and decompression. Decompression must be enabled using the --ora-decompress option.

--ora-reference

Yes

Path to the directory that contains the compression reference and index file.

--ora-input

Yes (or --fastq-list)

Specifies the input files for compression or decompression.

--fastq-list

Yes (or --ora-input)

Specifies a .csv file with list of FASTQ files to be compressed. This option is not specific to the DRAGEN ora compression and the usage is explained in the FASTQ CSV File Format Section of this manual. Compression of a list of FASTQ containing different species is not supported while decompression of FASTQ containing different species is supported.

--ora-input2

No

Used for interleaved compression of paired-read files when input files are specified with --ora-input. Specify the paired-read files corresponding to files secified in --ora-input to achieve paired-read file compression into one single interleaved file. The number of files and the order of paired-read files in --ora-input and --ora-input2 should match.

--ora-interleaved-compression

No

Used for interleaved compression of paired-read files when input files are specified with --fastq-list. Set to true to enable paired-read file compression into one single interleaved file. Each line of the fastq-list.csv file is the two corresponding paired-read files with same count of reads.

--ora-compression-species

No

Sring to specify the reference species to compress data on. Possible values <genus_specificname> as listed in the list of references supported below or homo_sapiens_bisulfite. If not used, default compresses on regular human reference.

--ora-decompress

No

Set to true to enable decompress mode. The default value is false. Note: fastq.ora files compressed with non-human or specific human references cannot be decompressed on DRAGEN versions older that v4.3.

--force

No

Compresses to output directory even if the compressed file already exists. The existing compressed file is overwritten.

--ora-threads-per-file <#>

No

Manually controls the number of CPU threads for compressing each FASTQ input file. The default value is 8.

--ora-parallel-files <#>

No

Manually controls the number of input FASTQ files processed in parallel. The default value is 4.

--ora-print-file-info

No

Prints file information summary of ORA compressed files. Note: this option cannot be used simultaneously with the --ora-decompress option and the --ora-check-file-integrity option.

--ora-parallel-files <#>

No

Manually controls the number of input FASTQ files processed in parallel. The default value is 4.

--ora-print-file-info

No

Set to true to print file information summary of ORA compressed files. Requires ORA file as input. Note: this option cannot be used simultaneously with the --ora-decompress option and the --ora-check-file-integrity option.

--ora-get-metadata

No

Set to true to generate a json file with the metadata of the ORA file. The metadata contains the information summary of ORA compressed files plus other metadata automatically stored when compression has been done from DRAGEN BCLconvert v4.4+. This metadata includes sequencing platform, flowcell ID, run ID, etc. Requires ORA file as input.

--ora-list-species

No

--ora-check-file-integrity

No

Set to trueto perform and output result of FASTQ file and decompressed FASTQ.ORA integrity check. The default value is false. Note: this option cannot be performed in the same command line than the compression itself as it requires fastq.ora format for the --ora-input argument.

--ora-enable-md5

No

Set to true to compute md5 checksum of fastq.ora files during the compression and generate an ora.md5sum file with md5 checksum printed.

--ora-delete-input-files

No

Set to true to automatically delete the input FASTQ file from the disk upon completion of compression

--ora-original-name

No

At decompression, set to true to retrieve the name of the original FASTQ before compression. Default re-uses the name of the FASTQ.ORA provided as input.

Use the --output-directory option to specify the directory to store output compressed/decompressed files.

Interleaved Compression

There are two methods to achieve a paired compression aka interleaved compression:

  • when using --ora-input and --ora-input2. The nth file of the --ora-input list is compressed together with the nth file of the --ora-input2

  • when using --fastq-listand --ora-interleaved-compression set to true. The paired-read files from the nth line of fast-list.csv are compressed together

Both files are interleaved within a single ORA output file with file name containing -interleaved. Using these options to compress paired files together improves compression by up to 10%. If decompressing an ORA file that contains paired data, the file is automatically decompressed to two separate files. To map an ORA file that contains paired interleaved data with the DRAGEN mapper, use the --interleaved option.

How to use ORA input files with DRAGEN Map/Align

DRAGEN can directly process ORA files. The same options as the other FASTQ input file types can be used. To use the ORA file, replace the FASTQ file name with the ORA file name and specify the ORA reference directory using --ora-reference.

The following command represents paired-end in two matched ORA FASTQ files (-1 and -2 options).

dragen -r <REF_DIR> -1 <fastq.ora1> -2 <fastq.ora2> \
--ora-reference <ORADATA_DIR> \
--output-directory <OUT_DIR> --output-file-prefix <OUT_PREFIX> \
--RGID <RGID> --RGSM <RGSM>

List of supported references

Either the whole database or specific references can be dowloaded.

Model
Valid string value
Size

Human

Homo_sapiens

3.5 GB

Human methylated data

Homo_sapiens_bisulfite

8.3 GB

Pig

Sus_scrofa

3.3 GB

Chicken

Gallus_gallus

2.9 GB

Rice

Oryza_sativa

1.43 GB

Arabidopsis

Arabidopsis_thaliana

336 MB

Wheat

Triticum_aestivum

10.3 GB

Cattle

Bos_taurus

3.4 GB

Soybean

Glycine_max

1.6 GB

Rat

Rattus_norvegicus

3.3 GB

Maize

Zea_mays

3.2 GB

Zebrafish

Danio_rerio

3.1 GB

Mouse

Mus_musculus

3.4 GB

Roundworm

Caenorhabditis_elegans

336 MB

Duck

Cairina moschata

3.0 GB

You can select at compression which reference species to use with option --ora-compression-species <species_scientific_name>. If unspecified, Homo sapiens reference will be used by default. Using a reference species that does not match the organism sequenced in your FASTQ file will still produce valid ORA compressed file, albeit with lower compression ratio. If the oradata folder pointed by --ora-reference does not contain the requested species, DRAGEN will stop with error. At decompression, detection of the species used to compress the ORA file is automatic. DRAGEN will look for the appropriate species in the oradata folder pointed by --ora-reference. If it is missing, DRAGEN will stop with an error message indicating the name of the missing species. In that case download it from the .

Set to true to print the list of supported references. Note: the printed list may not be exhaustive, if you don't find your species check the most up-to-date list in the .

Below is a list of supported references. This list may not be exhaustive, the most up-to-date list of supported references can be found on the .

DRAGEN Software Support Site page
DRAGEN Software Support Site page
DRAGEN Software Support Site page
Licensing Reference Section
DRAGEN Software Support Site page
DRAGEN ORA compression from BCL