# JSON Metrics Reporting

DRAGEN generates an output JSON file, `<output-prefix>.metrics.json` that aggregates metadata and module information into a single file that be easily parsed and indexed.

The JSON file currently contains the following modules (when enabled):

* Mapping and Aligning metrics (analogous to `<output-prefix>.mapping_metrics.csv`)
* Variant Calling metrics (analogous to `<output-prefix>.vc_metrics.csv`)
* Coverage region metrics (analogous to `<output-prefix>.<coverage-region-prefix>.coverage_metrics.csv`)
* FASTQC metrics (analogous to `<output-prefix>.fastqc_metrics.csv`)

The JSON file also currently contains the following sections:

* Metadata

## Format

The JSON file is composed of nested dictionary entries containing metadata and metric information.

A standard output JSON metrics file is shown below:

```
{
  "metadata": {
    "dragenVersion": "x.y.z",
    "licenseInfo": [
      {
        "license": "Genome",
        "licenseUsage": 0
      }
    ],
    "pipeline": "Germline"
  },
  "modules": {
    "coverageSummary": {

    },
    "fastQc": {

    },
    "mapAlign": {

    },
    "variantCaller": {

    }
  }
}
```

The metadata section contains information about a DRAGEN run and license information. It currently contains the following information:

* DRAGEN version - the version of DRAGEN used (string)
* License info - an array of JSON objects containing license name (string) and license usage (integer)
* Pipeline - the pipeline executed by DRAGEN (string)

A typical metric field will have the following format:

```
"metricName": {
  "description": "Text description",
  "percentage": "Percentage of total (if applicable, i.e. reads, bases, etc.)",
  "value": <Number> or <String>,
  "units": "Metric units"
}
```

## Mapping and Aligning Metrics

The `mapAlign` module contains two nested dictionaries: one for global metrics applicable to the whole sample (`globalMetrics`) and one for all read group information (`perReadGroupMetrics`). Read group metrics are dictionaries indexed by the read group name and contain per read group level information.

This is summarized in the following format:

```
"mapAlign": {
  "globalMetrics": {
    "metricName": {
    },
    ...
  },
  "perReadGroupMetrics: {
    "readGroup1": {
      "metricName": {

      },
      ...
    },
    "readGroup2": {
      "metricName": {
        
      },
      ...
    }
  }
}
```

The following table shows the mapping between the corresponding JSON field name and the standard output/CSV name:

| JSON Metric Name               | Standard Output/CSV Name                                    |
| ------------------------------ | ----------------------------------------------------------- |
| totalInputReads                | Total input reads                                           |
| duplicateMarkedReads           | Number of duplicate marked reads                            |
| duplicatesRemoved              | Number of duplicate marked and mate reads removed           |
| uniqueReads                    | Number of unique reads                                      |
| readsMateSequenced             | Reads with mate sequenced                                   |
| readsWithoutMateSequenced      | Reads without mate sequenced                                |
| qcFailedReads                  | QC-failed reads                                             |
| mappedReads                    | Mapped reads                                                |
| mappedReadsR1                  | Mapped reads R1                                             |
| mappedReadsR2                  | Mapped reads R2                                             |
| mappedReadsToPopAltInsertions  | Mapped reads to pop-alt insertions (PAI)                    |
| mappedReadsToNonRefDecoys      | Mapped reads to non-ref decoys (NRD)                        |
| mappedReadsToRefExternalSeq    | Mapped reads to ref-external sequences (PAI or NRD)         |
| mappedReadsToFilterContigs     | Mapped reads (RNA) to rRNA and filtered                     |
| mappedReadsToExcludedContigs   | Mapped reads (RNA) to chrM and excluded from metrics        |
| mappedReadsAdj                 | Mapped reads including ref-external or filtered or excluded |
| unmappedReads                  | Unmapped reads                                              |
| unmappedReadsAdjForRefExternal | Unmapped reads minus ref-external mappings                  |
| unmappedReadsAdjForFiltered    | Unmapped reads minus filtered mappings                      |
| unmappedReadsAdjForExcluded    | Unmapped reads minus excluded mappings                      |
| unmappedReadsAdj               | Unmapped reads minus ref-external or filtered or excluded   |
| singletonReads                 | Singleton reads                                             |
| pairedReads                    | Paired reads                                                |
| properlyPairedReads            | Properly paired reads                                       |
| discordantReads                | Not properly paired reads (discordant)                      |
| pairedReadsDiffChrom           | Paired reads mapped to different chromosomes                |
| pairedReadsDiffChromMapQ10     | Paired reads mapped to different chromosomes (MAPQ >= 10)   |
| readsMultipleLoc               | Reads mapping to multiple locations                         |
| readsMapQ40Inf                 | Reads with MAPQ \[40:inf)                                   |
| readsMapQ3040                  | Reads with MAPQ \[30:40)                                    |
| readsMapQ2030                  | Reads with MAPQ \[20:30)                                    |
| readsMapQ1020                  | Reads with MAPQ \[10:20)                                    |
| readsMapQ010                   | Reads with MAPQ \[ 0:10)                                    |
| readsMapQNa                    | Reads with MAPQ NA (Unmapped reads)                         |
| readsWithIndelR1               | Reads with indel R1                                         |
| readsWithIndelR2               | Reads with indel R2                                         |
| readsWithSpliceJunction        | Reads with splice junction                                  |
| totalBases                     | Total bases                                                 |
| totalBasesR1                   | Total bases R1                                              |
| totalBasesR2                   | Total bases R2                                              |
| mappedBases                    | Mapped bases                                                |
| mappedBasesR1                  | Mapped bases R1                                             |
| mappedBasesR2                  | Mapped bases R2                                             |
| softClippedBases               | Soft-clipped bases                                          |
| softClippedBasesR1             | Soft-clipped bases R1                                       |
| softClippedBasesR2             | Soft-clipped bases R2                                       |
| hardClippedBases               | Hard-clipped bases                                          |
| hardClippedBasesR1             | Hard-clipped bases R1                                       |
| hardClippedBasesR2             | Hard-clipped bases R2                                       |
| mismatchedBasesR1              | Mismatched bases R1                                         |
| mismatchedBasesR2              | Mismatched bases R2                                         |
| mismatchedBasesR1ExIndel       | Mismatched bases R1 (excl. indels)                          |
| mismatchedBasesR2ExIndel       | Mismatched bases R2 (excl. indels)                          |
| q30Bases                       | Q30 bases                                                   |
| q30BasesR1                     | Q30 bases R1                                                |
| q30BasesR2                     | Q30 bases R2                                                |
| q30BasesNonDupNonClipped       | Q30 bases (excl. dups & clipped bases)                      |
| totalAlignments                | Total alignments                                            |
| secondaryAlignments            | Secondary alignments                                        |
| supplementaryAlignments        | Supplementary (chimeric) alignments                         |
| estimatedReadLength            | Estimated read length                                       |
| insertLengthMean               | Insert length: mean                                         |
| insertLengthMedian             | Insert length: median                                       |
| insertLengthStdDev             | Insert length: standard deviation                           |
| inputBasesRefGenomeRatio       | Input bases divided by reference genome size                |
| inputBasesTargetBedRatio       | Input bases divided by target bed size                      |
| estimatedSampleContamination   | Estimated sample contamination                              |

## Variant Calling Metrics

The `variantCaller` module contains three nested dictionaries: the variant calling summary (`summary`), prefilter metrics (`prefilter`), and postfilter metrics (`postfilter`). The prefilter and postfilter metrics are dictionaries index by the read group name.

This summarized in the following format:

```
"variantCaller": {
  "summary": {
    "metricName": {
    },
    ...
  },
  "preFilter": {
    "readGroup1": {
      "metricName": {
      },
      ...
    },
    ...
  },
  "postFilter": {
    "readGroup1": {
      "metricName": {
      },
      ...
    },
    ...
  },
}
```

The following table shows the mapping between the corresponding JSON field name and the standard output/CSV name:

| JSON Metric Name           | Standard Output/CSV Name                           |
| -------------------------- | -------------------------------------------------- |
| numberOfSamples            | Number of samples                                  |
| readsProcessed             | Reads Processed                                    |
| childSample                | Child Sample                                       |
| totalVariants              | Total                                              |
| singleAllelic              | Single allelic                                     |
| biallelic                  | Biallelic                                          |
| multiallelic               | Multiallelic                                       |
| snps                       | SNPs                                               |
| insertions                 | Insertions                                         |
| insertionsHap              | Insertions (Hap)                                   |
| insertionsHom              | Insertions (Hom)                                   |
| insertionsHet              | Insertions (Het)                                   |
| deletions                  | Deletions                                          |
| deletionsHap               | Deletions (Hap)                                    |
| deletionsHom               | Deletions (Hom)                                    |
| deletionsHet               | Deletions (Het)                                    |
| indelsHet                  | Indels (Het)                                       |
| denovoAutosomeSnp          | DeNovo Autosome SNPs                               |
| denovoAutosomeIndel        | De Novo INDELs                                     |
| denovoChrXSnp              | DeNovo chrX SNPs                                   |
| denovoChrXIndel            | DeNovo chrX INDELs                                 |
| denovoChrYSnp              | DeNovo chrY SNPs                                   |
| denovoChrYIndel            | DeNovo chrY INDELs                                 |
| chrXSnp                    | Chr X number of SNPs over `<region>`               |
| chrYSnp                    | Chr Y number of SNPs over `<region>`               |
| chrXYSnpRatio              | (Chr X SNPs)/(chr Y SNPs) ratio over `<region>`    |
| snpTransitions             | SNP Transitions                                    |
| snpTransversions           | SNP Transversions                                  |
| tiTvRatio                  | Ti/Tv ratio                                        |
| numHeterozygous            | Heterozygous                                       |
| numHomozygous              | Homozygous                                         |
| snpMosaic                  | SNP Mosaics                                        |
| indelMosaic                | Indel Mosaics                                      |
| inDbSnp                    | In dbSNP                                           |
| notInDbSnp                 | Not in dbSNP                                       |
| percentCallability         | Percent Callability                                |
| percentAutosomeCallability | Percent Autosome Callability                       |
| percentExomeCallability    | Percent Autosome Exome Callability                 |
| percentQcRegionCallability | Percent QC Region Callability in Region `<number>` |

## QC Coverage Region Metrics

The `coverageSummary` module contains nested dictionaries for each of the QC regions provided as input to DRAGEN. Each dictionary is indexed by the name of the region if provided (e.g `--qc-coverage-tag` is set), or by the default command line name of the region (e.g. `qc-coverage-region-1`).

This is summmarized in the following format:

```
"coverageSummary": {
  "qc-coverage-region-1": {
    "metricName": {
    },
    ...
  }
  "qc-coverage-region-2": {
    "metricName": {
    },
    ...
  }
  "my-custom-region-name": {
    "metricName": {
    },
    ...
  },
}
```

The following table shows the mapping between the corresponding JSON field name and the standard output/CSV name:

| JSON Metric Name               | Standard Output/CSV Name                                  |
| ------------------------------ | --------------------------------------------------------- |
| alignedBases                   | Aligned bases                                             |
| alignedBasesInRegion           | Aligned bases in `<region>`                               |
| avgAlignmentCovOverRegion      | Average alignment coverage over `<region>`                |
| uniformityCov20PerOverRegion   | Uniformity of coverage (PCT > 0.2\*mean)                  |
| uniformityCov40PerOverRegion   | Uniformity of coverage (PCT > 0.4\*mean)                  |
| pctOfRegionWithCoverageNxtoInf | PCT of `<region>` with coverage Nx to Inf                 |
| medianChrXCovOverRegion        | Median chr X coverage (ignore 0x regions) over `<region>` |
| medianChrYCovOverRegion        | Median chr Y coverage (ignore 0x regions) over `<region>` |
| avgMitoCovOverRegion           | Average mitochondrial coverage over `<region>`            |
| avgAutosomalCovOverRegion      | Average autosomal coverage over `<region>`                |
| medianAutosomalCovOverRegion   | Median autosomal coverage over `<region>`                 |
| meanMedianCovRatioOverRegion   | Mean/Median autosomal coverage ratio over `<region>`      |
| alignedReads                   | Aligned reads                                             |
| alignedReadsInRegion           | Aligned reads in `<region>`                               |

## FASTQC Metrics

The `fastQc` module contains nested dictionaries of metrics for read 1 and read 2 (if applicable). Since these represent sets of histogram data, the format in JSON is different than other modules.

This is summmarized in the following format:

```
"fastQc": {
  "gcContent": {
    "read1": {
    },
    "read2": {
    }
  }
  "gcContentQual": {
    ...
  },
  "posQual": {
    ...
  },
  "positionalBaseContent": {
    ...
  },
  "positionalBaseQuality": {
    ...
  },
  "readLengths": {
    ...
  },
  "readMeanQuality": {
    ...
  },
  "seqPos": {
    ...
  },
}
```

The following table shows the mapping between the corresponding JSON field name and the standard output/CSV name:

| JSON Metric Name      | Standard Output/CSV Name     |
| --------------------- | ---------------------------- |
| readMeanQuality       | Read Mean Quality            |
| positionalMeanQuality | Positional Base Mean Quality |
| positionalBaseContent | Positional Base Content      |
| readLengths           | Read Lengths                 |
| readGcContent         | Read GC Content              |
| readGcContentQuality  | Read GC Content Quality      |
| seqPos                | Sequence Positions           |
| posQuality            | Positional Quality           |
