DRAGEN
Illumina Connected Software
  • Overview
    • Illumina® DRAGEN™ Secondary Analysis
    • DRAGEN Applications
    • Deployment Options
  • Product Guides
    • DRAGEN v4.4
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • Clinical Research Workflows
        • DRAGEN Heme WGS Tumor Only Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
        • DRAGEN Solid WGS Tumor Normal Pipeline
          • Quick Start
          • Sample Sheets
            • Introduction
            • Requirements
            • Templates
          • Run Planning
            • Sample Sheet Creation in BaseSpace
            • Custom Config Support
          • DRAGEN Server App
            • Quick Start
            • Getting Started
            • Launching Analysis
            • Command Line Options
            • Output
            • Advanced Topics
            • Custom Workflow
              • Custom Config Support
            • Illumina Connected Insights
          • ICA Cloud App
            • Getting Started
            • Launching Analysis
            • Output
            • Advanced Topics
              • Custom Workflow
              • Custom Config Support
              • Post Processing
              • Illumina Connected Insights
          • Analysis Output
          • Analysis Methods
          • Troubleshooting
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • Illumina scRNA
        • Other scRNA prep
        • RNA Panel
        • RNA WTS
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Pedigree Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • Available pipelines
            • Germline CNV Calling (WGS/WES)
            • Germline CNV Calling ASCN (WGS)
            • Multisample Germline CNV Calling
            • Somatic CNV Calling ASCN (WGS)
            • Somatic CNV Calling WES
            • Somatic CNV Calling ASCN (WES)
          • Additional documentation
            • CNV Input
            • CNV Preprocessing
            • CNV Segmentation
            • CNV Output
            • CNV ASCN module
            • CNV with SV Support
            • Cytogenetics Modality
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
          • Structural Variant IGV Tutorial
        • VNTR Calling
        • Population Genotyping
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • JSON Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single Cell Pipeline
        • Illumina PIPseq scRNA
        • Other scRNA Prep
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN MRD Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
        • Docker Requirements
      • DRAGEN Reports
      • Tools and Utilities
    • DRAGEN v4.3
      • Getting Started
      • DRAGEN Host Software
        • DRAGEN Secondary Analysis
      • DRAGEN Reference Support
        • Prepare a Reference Genome
      • DRAGEN DNA Pipeline
        • DNA Mapping
        • Read Trimming
        • DRAGEN FASTQC
        • Sorting and Duplicate Marking
        • Small Variant Calling
          • ROH Caller
          • B-Allele Frequency Output
          • Somatic Mode
          • Joint Analysis
          • De Novo Small Variant Filtering
          • Autogenerated MD5SUM for VCF Files
          • Force Genotyping
          • Machine Learning for Variant Calling
          • Evidence BAM
          • Mosaic Detection
          • VCF Imputation
          • Multi-Region Joint Detection
        • Copy Number Variant Calling
          • CNV Output
          • CNV with SV Support
          • Multisample CNV Calling
          • Somatic CNV Calling WGS
          • Somatic CNV Calling WES
          • Allele Specific CNV for Somatic WES CNV
        • Repeat Expansion Detection
          • De Novo Repeat Expansion Detection
        • Targeted Caller
          • CYPDB6 Caller
          • CYP2D6 Caller
          • CYP21A2 Caller
          • GBA Caller
          • HBA Caller
          • LPA Caller
          • Rh Caller
          • SMN Caller
        • Structural Variant Calling
          • Structural Variant De Novo Quality Scoring
        • VNTR Calling
        • Filter Duplicate Variants
        • Ploidy Calling
          • Ploidy Estimator
          • Ploidy Caller
        • Multi Caller
        • QC Metrics Reporting
        • HLA Typing
        • Biomarkers
          • Tumor Mutational Burden
          • Microsatellite Instability
          • Homologous Recombination Deficiency
          • BRCA Large Genomic Rearrangment
          • DRAGEN Fragmentomics
        • Downsampling
          • Fractional (Raw Reads) Downsampling
          • Effective Coverage Downsampling
        • Unique Molecular Identifiers
        • Indel Re-aligner (Beta)
        • Star Allele Caller
        • High Coverage Analysis
        • CheckFingerprint
        • Population Haplotyping (Beta)
        • DUX4 Rearrangement Caller
      • DRAGEN RNA Pipeline
        • RNA Alignment
        • Gene Fusion Detection
        • Gene Expression Quantification
        • RNA Variant Calling
        • Splice Variant Caller
      • DRAGEN Single-Cell Pipeline
        • scRNA
        • scATAC
        • Single-Cell Multiomics
      • DRAGEN Methylation Pipeline
      • DRAGEN Amplicon Pipeline
      • Explify Analysis Pipeline
        • Kmer Classifier
        • Kmer Classifier Database Builder
      • DRAGEN Recipes
        • DNA Germline Panel UMI
        • DNA Germline Panel
        • DNA Germline WES UMI
        • DNA Germline WES
        • DNA Germline WGS UMI
        • DNA Germline WGS
        • DNA Somatic Tumor-Normal Solid Panel UMI
        • DNA Somatic Tumor-Normal Solid Panel
        • DNA Somatic Tumor-Normal Solid WES UMI
        • DNA Somatic Tumor-Normal Solid WES
        • DNA Somatic Tumor-Normal Solid WGS UMI
        • DNA Somatic Tumor-Normal Solid WGS
        • DNA Somatic Tumor-Only Heme WGS
        • DNA Somatic Tumor-Only Solid Panel UMI
        • DNA Somatic Tumor-Only Solid Panel
        • DNA Somatic Tumor-Only Solid WES UMI
        • DNA Somatic Tumor-Only Solid WES
        • DNA Somatic Tumor-Only Solid WGS UMI
        • DNA Somatic Tumor-Only Solid WGS
        • DNA Somatic Tumor-Only ctDNA Panel UMI
        • RNA Panel
        • RNA WTS
      • BCL conversion
      • Illumina Connected Annotations
      • ORA Compression
      • Command Line Options
      • DRAGEN Reports
      • Tools and Utilities
  • Reference
    • DRAGEN Server
    • DRAGEN Multi-Cloud
      • DRAGEN on AWS
      • DRAGEN on AWS Batch
      • DRAGEN on Microsoft Azure
        • Run DRAGEN VM on Azure
      • DRAGEN on Microsoft Azure Batch
        • Azure Batch Run Modes
    • DRAGEN Licensing
      • DRAGEN Server Licensing
      • DRAGEN Cloud Licensing
    • DRAGEN Application Manager
    • Support
    • Resource Files
      • Noise Baselines
    • Supplementary Information
    • Troubleshooting
    • Citing DRAGEN software
    • Release Notes
    • Revision History
Powered by GitBook
On this page
  • Command-Line Examples
  • Step 1: Phase Common
  • Step 2: Ligate Common
  • Step 3: Phase Rare
  • Step 4: Concat All
  • Input Files
  • msVCF Input (step 1 and step 3)
  • Genetic map (step 1 and step 3)
  • Config file (step 1 and step 3)
  • Sample type file (step 1 and step 3)
  • Output Files
  • Phase Common step
  • Ligate Common step
  • Phase Rare step
  • Concat All step
  • Command-Line Options for step 1: Phase Common
  • Command-Line Options for step 2: Ligate Common
  • Command-Line Options for step 3: Phase Rare
  • Command-Line Options for step 4: Concat All
  • Population Haplotyping Accuracy
  • Command-Line example
  • Command-Line Options

Was this helpful?

Export as PDF
  1. Product Guides
  2. DRAGEN v4.4
  3. DRAGEN DNA Pipeline

Population Haplotyping (Beta)

The Population Haplotyping sofware supports the estimation of haplotypes from a population scale dataset via the packaging of the SHAPEIT5 Software (2022, Hofmeister RJ, Ribeiro DM, Rubinacci S., Delaneau O). It is designed to phase common variants as well as rare variants in a step-by-step mode. The following step-by-step workflow must be reproduced to phase each chromosome of the studied genome.

  • Step 1: Phase Common step to estimate the haplotypes of common variants (variants with allele frequency above a given allele frequency threshold) on defined regions.

  • Step 2: Common Ligate step to ligate the phased common variants from step 1 into a single chromosome.

  • Step 3: Phase Rare step to add the haplotypes of rare variants (variants with allele frequency below a given allele frequency threshold) on defined regions to the common variant scaffold obtained in step 2.

  • Step 4: Concat All step to concatenate the haplotype regions obtained in step 3 into a single chromosome.

This software provides best accuracy on population scale dataset with thousands of samples. It is recommended to be run on multiple nodes to parallelize processes. A common use case of the Population Haplotyping software is the generation of a custom reference panel to be used for the VCF Imputation pipeline.

The software supports autosomes and mixed ploidy chromosomes for diploid species only. It does not use the FPGA accelerated capability and it can run on generic software only compute node.

Note: the Population Haplotyping software only supports input msVCF produced with the DRAGEN gVCF Genotyper software.

Command-Line Examples

The following is an example of required command to generate haplotypes on common and rare variants (with default allele frequency threshold) on population scale dataset:

Step 1: Phase Common

dragen \
  --enable-population-haplotyping true \
  --enable-phase-common true \
  --ph-phase-common-input-list <path_to_txt_file> \
  --ph-phase-common-input-region <string> \
  --ph-phase-common-map <path_genetic_map> \
  --ph-phase-common-config <path_config_txt_file> \
  --ph-phase-common-sample-type <path_sample_type_txt_file> \
  --output-directory <DIR> \
  [options]

Step 2: Ligate Common

dragen \
  --enable-population-haplotyping true \
  --enable-ligate-common true \
  --ph-ligate-common-input-list <path_to_txt_file> \
  --output-directory <DIR> \
  [options]

Step 3: Phase Rare

dragen \
  --enable-population-haplotyping true \
  --enable-phase-rare true \
  --ph-phase-rare-input <path_to_preprocessed_file_output_of_step_1> \
  --ph-phase-rare-input-region <string> \
  --ph-phase-rare-scaffold <path_to_scaffold_file_output_of_step_1> \
  --ph-phase-rare-scaffold-region <string> \
  --ph-phase-rare-map <path_genetic_map> \
  --ph-phase-rare-config <path_config_txt_file> \
  --ph-phase-rare-sample-type <path_sample_type_txt_file> \
  --output-directory <DIR> \
  [options]

Step 4: Concat All

To generate per chromosome haplotypes:

dragen \
  --enable-population-haplotyping true \
  --enable-concat-all true \
  --ph-concat-all-input-list <path_to_txt_file> \
  --output-directory <DIR> \
  [options]

To generate per genome haplotyped sites

dragen \
  --enable-population-haplotyping true \
  --enable-concat-all true \
  --ph-concat-all-input-list-sites-only <path_to_txt_file> \
  --output-directory <DIR> \
  [options]

Input Files

msVCF Input (step 1 and step 3)

msVCF input list for the Phase Common step (step 1)

For the Phase Common step (step 1), it is recommended to provide msVCF generated with the DRAGEN gVCF Genotyper software. Both multi-allelic formated sites and bi-allelic formated sites are supported. This first step takes as input a .txt file with path to a single msVCF or a list of msVCF, one line per path. The msVCF must comply with the following requirements:

  • per chromosome msVCF

  • generated from the same reference build

  • compressed and indexed

  • with unphased GT calls

  • with no duplicates

  • with header ##contig "ID" and "length" fields for all contigs present in the studied genome

Note: for mixed ploidy chromosomes each PAR and non-PAR regions of the chromosome must be treated as a single chromosome. For example, on human data, the sample input msVCF for chrX must be divided into chrX_par1, chrX_par2, and chrX_nonpar.

msVCF input for the Phase Rare step (step 3)

The msVCF input list provided at step 1 is pre-processed to generate a formatted msVCF called <prefix>.preprocess.vcf.gz. This formatted msVCF is generated in the directory and must be used as input of the Phase Rare step (step 3).

Note: streaming from the cloud is not supported. Instead use predownload and local input process to achieve maximum IO efficiency and stability.

Genetic map (step 1 and step 3)

The genetic map should follow the format:

  • 3 columns: position, chromosome number, distance (cM), in this order and tab separated

  • Genetic map for mixed ploidy chromosome must be seperated into as many PAR and non PAR regions (e.g. for human, chromosome X is split into PAR1 chrX_par1, PAR2 chrX_par2 and non PAR chrX_nonpar regions)

  • Genetic map for region in which all samples are haploid is not needed (e.g. for human, chromosome Y chrY)

The user must ensure the genetic maps provided are from the same reference build than the reference used to generate the msVCF input.

Config file (step 1 and step 3)

This configuration file is a text file and is a required file. It allows for proper handling of haploid/diploid chromosomes and verifivation of concordence between genetic maps, msVCF input and sample type file information. Current configuration supports binary gender (male or female) and ploidy 2 or 1. When a region has different ploidies in male and female samples, the region is considered mixed ploidy region (e.g. for human, non PAR region on chromosome X chrX_nonpar).

Example of Config file

##version=1.0
##ref_build=hg38
#filename    region    male_ploidy    female_ploidy
chr1.gmap.gz    chr1:1-248956422    2    2
chr2.gmap.gz    chr2:1-242193529    2    2
chr3.gmap.gz    chr3:1-198295559    2    2
chr4.gmap.gz    chr4:1-190214555    2    2
chr5.gmap.gz    chr5:1-181538259    2    2
chr6.gmap.gz    chr6:1-170805979    2    2
chr7.gmap.gz    chr7:1-159345973    2    2
chr8.gmap.gz    chr8:1-145138636    2    2
chr9.gmap.gz    chr9:1-138394717    2    2
chr10.gmap.gz    chr10:1-133797422    2    2
chr11.gmap.gz    chr11:1-135086622    2    2
chr12.gmap.gz    chr12:1-133275309    2    2
chr13.gmap.gz    chr13:1-114364328    2    2
chr14.gmap.gz    chr14:1-107043718    2    2
chr15.gmap.gz    chr15:1-101991189    2    2
chr16.gmap.gz    chr16:1-90338345    2    2
chr17.gmap.gz    chr17:1-83257441    2    2
chr18.gmap.gz    chr18:1-80373285    2    2
chr19.gmap.gz    chr19:1-58617616    2    2
chr20.gmap.gz    chr20:1-64444167    2    2
chr21.gmap.gz    chr21:1-46709983    2    2
chr22.gmap.gz    chr22:1-50818468    2    2
chrX_par1.gmap.gz    chrX:1-2781479    2    2
chrX_nonpar.gmap.gz    chrX:2781480-155701382    1    2
chrX_par2.gmap.gz    chrX:155701383-156040895    2    2

Instructions to make a custom configuration file:

The config file is a text file with the headers:

  • ##version

  • ##ref_build indicating the reference build used for the study.

The Config file is a txt file and contains 4 columns, tabs delimited. Each of them must be populated.

Column information
Description

First column: filename

Specifies the genetic map basename, 1 name per line. Mixed ploidy chromosomes must be separated into par and non-par regions. Basenames must match genetic map basenames.

Second column: region

Specifies the start and end positions of the chromosome or sub-chromosome region with format <contig_name>:<start_position>-<end_position>. For chromosomes without mixed ploidy regions, the start position is 1, end position is the length of the chromosome (1-based, inclusive). For chromosomes with mixed ploidy regions, for each region, the start and end positions are those of the region (1-based, inclusive).

Third column: mixed ploidy subject

Specifies 2 on diploid chromosomes and PAR regions. 1 for non PAR region

Fourth column: diploid subject

Specifies 2 for all chromosomes

Note: for mixed ploidy chromosome ensure the genetic map is separated into as many PAR and non-PAR regions with no overlap. Example: for human data prefix should be chrX_par1, chrX_nonpar, and chrX_par2.

Sample type file (step 1 and step 3)

The sample type file is a required file. The number of samples and name of samples in the input multisample VCF and sample type file should match.

The sample type file is a txt file with the following format

  • 2 columns, tabs or space delimited

  • First column: list of all sample names present in the input sample

  • Second column: 1 or 2. 1 for subject with mixed ploidy chromosomes, 2 for subject with all diploid chromosomes.

Output Files

Phase Common step

The Phase Common step (step 1) is run on a defined region, and outputs:

  • a single scaffold msVCF and related msVCF index with phased common variants for that region. The default name is dragen.ph_phase_common.vcf.gz.

  • a single formatted msVCF called <prefix>.preprocess.vcf.gz and related index. This formatted msVCF is generated in the directory and must be used as input of the Phase Rare step (step 3).

Ligate Common step

The Ligate Common step (step 2) ligates the regions phased in step 1 and outputs a single scaffold msVCF and related msVCF index with phased common variants for a single chromosome. The default name is “dragen.ph_ligate_common.vcf.gz”.

Phase Rare step

The Phase Rare step (step 3) is run on a defined region on a chromosome with preprocessed unphased msVCF from step 1 and phased scaffold msVCF from step 2, and outputs:

  • a single phased msVCF and related msVCF index with phased common and rare variants for that region. The default name is “dragen.ph_rare_common.vcf.gz”.

  • a single 8-column VCF and related index listing all sites that have been phased for that region. The default name is “dragen.ph_rare_common.sites.vcf.gz”.This output is used at the Concat-All step to generate a VCF file with all phased sites accross the genome.

Concat All step

The Concat All processing is used to generate 2 types of output

  1. Phased common and rare variants for a chromosome

The Concat All step (step 4) concatenates the regions phased in step 3 and outputs an msVCF and related index with phased common and rare variants for a single chromosome. The default name is “dragen.ph_concat_all.vcf.gz”.

  1. List of phased sites

This output is useful for input of the ForceGT option in the variant calling software. The Concat All step lists all sites in a 8-column VCF format that have been phased and output a VCF and related index with list of phased sites. This output can be generated either from a list of phased site VCFs across the genome from step3, or, in a second step once the list of per chromosome sites have been generated. The default name is “dragen.ph_concat_all.sites.vcf.gz”.

Command-Line Options for step 1: Phase Common

Option
Required
Description

--enable-population-haplotyping

Yes

Set to true to enable population haplotyping software.

--enable-phase-common

Yes

Set to true to enable the Phase Common step.

--ph-phase-common-input-list

Yes

Provides a .txt file listing the sample input pertaining to one chromosome, with path to a single msVCF or a list of msVCF, one line per path. Note: in the case of mixed ploidy chromosome each PAR and non-PAR regions must be treated as a single chromosome.

--ph-phase-common-input-region

Yes

Specifies the target region to be phased. String in the format contigname: startposition-endposition. Regions must overlap between them for the downstream ligate common step. Examples of input region length for human data: 10 mbp Note: in the case of chromosome with mixed ploidy regions and diploid regions, the command should be run with one region at a time (e.g. three runs with three regions, chrX_par1, chrX_nonpar and chrX_par2, instead of one run with region chrX).

--ph-phase-common-map

Yes

Provides path to the chromosome genetic map. Note: in the case of mixed ploidy chromosome, the genetic map name must be divided into PAR and non-PAR regions.

--ph-phase-common-config

Yes

Provides path to the txt config file.

--ph-phase-common-reference

No

Provides the path to a reference panel of haplotypes in msVCF format. Useful for iterative haplotyping to accelerate the process.

--ph-phase-common-scaffold

No

Provides the path to a scaffold of haplotypes in msVCF format. Useful for iterative haplotyping to accelerate the process.

--ph-phase-common-sample-type

Yes

Provides the path to the Sample type file.

--ph-phase-common-pass-only

No

Default true. Filters-in the variants with PASS in the filter field. Set to false to run Phase Common on all the positions. When filter field PASS does not exist all the positions are phased by default.

--ph-phase-common-filter-maf

No

Default 0.001. Set the Minimum Allele Frequency threshold. All variants with allele frequency equal or above this MAF are phased during this Phase Common step.

--ph-phase-common-max-miss-gt-rate

No

Default 0.1. Set the threshold for variants to be skipped if the rate of missing GT is higher than this value.

--output-directory

Yes

Specifies the output directory.

--output-file-prefix

No

Outputs filename with the defined prefix for the file generated by the pipeline.

Command-Line Options for step 2: Ligate Common

Option
Required
Description

--enable-population-haplotyping

Yes

Set to true to enable population haplotyping software.

--enable-ligate-common

Yes

Set to true to enable the Ligate Common step.

--ph-ligate-common-input-list

Yes

Provide a .txt file with list of phased msVCF pertaining to a single chromosome. The msVCF files provided are the output files of Phase Common step, in ascending position order. Note: in the case of mixed ploidy chromosome each PAR and non-PAR regions must be treated as a single chromosome

--output-directory

Yes

Specifies the output directory.

--output-file-prefix

No

Outputs filename with the defined prefix for the file generated by the pipeline.

Command-Line Options for step 3: Phase Rare

Option
Required
Description

--enable-population-haplotyping

Yes

Set to true to enable population haplotyping software.

--enable-phase-rare

Yes

Set to true to enable the Phase Rare step.

--ph-phase-rare-input

Yes

Provides the path to the preprocessed unphased msVCF generated from Phase Common step covering the phase rare region.

--ph-phase-rare-input-region

Yes

Specifies the target region to be phased. String in the format contigname: startposition-endposition. Regions must not overlap or have gaps between them. Note: in the case of chromosome with mixed ploidy regions and diploid regions, the command should be run with one region at a time (e.g. three runs with three regions, chrX_par1, chrX_nonpar and chrX_par2, instead of one run with region chrX).

--ph-phase-rare-map

Yes

Provides the path to the genetic map of the chromosome. Note: in the case of mixed ploidy chromosome, the genetic map name must be divided into PAR and non-PAR regions.

--ph-phase-rare-config

Yes

Provides the path to the txt config file.

--ph-phase-rare-scaffold

Yes

Provides the path to the scaffold of haplotypes in msVCF format generated from Ligate Common step.

--ph-phase-rare-scaffold-region

Yes

Specifies the scaffold region to be phased. String in the format contigname: startposition-endposition. This scaffold region needs to cover the Input region and to allow buffer between regions. The buffer length impacts the accuracy and speed of the process: longer length is slower but improves accuracy.

--ph-phase-rare-sample-type

Yes

Provides the path to the Sample type file.

--ph-phase-rare-filter-maf

No

Default 0.001. Set the Maximum Allele Frequency threshold. All variants with allele frequency below this MAF are phased during this Phase Rare step. This value must be the same as the one provided at –ph-phase-common-filter-maf. If values differ not all variants will be phased.

--output-directory

Yes

Specifies the output directory.

--output-file-prefix

No

Outputs filename with the defined prefix for the file, generated by the pipeline.

Command-Line Options for step 4: Concat All

Option
Required
Description

--enable-population-haplotyping

Yes

Set to true to enable population haplotyping software.

--enable-concat-all

Yes

Set to true to enable the Concat All step.

--ph-concat-all-input-list

Yes when --ph-concat-all-input-list is not provided

Provides a .txt file with list of phased msVCF pertaining to a single chromosome. The msVCF files provided are the output files of Phase Rare step, in ascending position order. Note: in the case of mixed ploidy chromosome each PAR and non-PAR regions must be treated as a single chromosome.

--ph-concat-all-input-list-sites-only

Yes when --ph-concat-all-input-list is not provided

Provides a .txt file with list of VCF containing all the haplotyped sites. The VCF files provided are the output files of Phase Rare step, in ascending position order, sex chromosomes at the end.

--output-directory

Yes

Specifies the output directory.

--output-file-prefix

No

Outputs filename with the defined prefix for the file generated by the pipeline.

Population Haplotyping Accuracy

An additional module of the Population Haplotyping software checks for the quality of the haplotypes produced based on a phased truth set provided as input.

Command-Line example

dragen \
  --enable-population-haplotyping true \
  --enable-phase-qc true  \
  --ph-phase-qc-validation <path_to_phased_truth_set> \
  --ph-phase-qc-estimation <path_to_phased_msVCF> \
  --ph-phase-qc-input-region <string> \
  --output-directory <DIR> \
  [options]

Command-Line Options

Option
Required
Description

--enable-population-haplotyping

Yes

Set to true to enable population haplotyping software.

--enable-phase-qc

Yes

Set to true to enable the quality control module.

--ph-phase-qc-validation

Yes

Provides the path to the phased truth set msVCF. Note: the validation msVCF must have the same samples as in the estimation msVCF for which the phasing accuracy is to be estimated.

--ph-phase-qc-estimation

Yes

Provides the path to the phased msVCF, output of Concat All to be validated.

--ph-phase-qc-input-region

Yes

Specifies the target region to be phased. String in the format contigname: startposition-endposition (startposition-endposition is optional). Regions must not overlap or have gaps between them.

--output-directory

Yes

Specifies the output directory.

--output-file-prefix

No

Outputs filename with the defined prefix for the file generated by the pipeline.

PreviousCheckFingerprintNextDUX4 Rearrangement Caller

Last updated 2 days ago

Was this helpful?

A per chromosome genetic map corresponding to the studied species and to the reference build used for the msVCF input is required. You can use your own genetic map computed from the recombination rate of the species and its reference genome, or use the geentic map corresponding to the human hg38 reference genome available to download from the . DRAGEN does not generate custom genetic map files.

The user can provide its own or use the one available to download from .

DRAGEN Software Support Site page
DRAGEN Software Support Site page