Azure Batch Run Modes

Run DRAGEN VM on Azure Batch

Use the following information to run the DRAGEN virtual machine (VM) on Microsoft Azure Batch. For information on using DRAGEN, see the see the DRAGEN User Guide Section. For information on using Azure, see the Azure documentation available on the Microsoft site.

  1. Navigate to the Microsoft Azure portal, and then sign in.

  2. Select Marketplace.

  3. Select View Private Offers, and then select DRAGEN on Azure.

  4. Select Create. Starting from a preset configuration option is not recommended for DRAGEN.

  5. Select a subscription and resource group from the drop-down menus, or select Create New.

  6. Enter a name for the virtual machine.

  7. Select a region that is compatible with the NP-series. See the Azure documentation available on the Microsoft site for more information.

  8. Select DRAGEN and the current version as the image.

  9. Select a storage size from the Size drop-down list. Only NP10 and NP20 sizes are compatible.

  10. Configure any additional VM settings. For your disk type, DRAGEN recommends using Premium SSD for optimal performance. For information, see the Azure documentation available on the Microsoft site.

  11. When finished, select Review + Create

  12. To launch the VM, select Create.

  13. After deployment completes, select Go to Resource. After your VM deploys, you can connect to the DRAGEN VM via the Azure Cloud Shell or another client of your choice. For more information on using a VM, see the Azure documentation available on the Microsoft site. For more information on DRAGEN analysis and command line options,

Run DRAGEN With ARM

You can also run DRAGEN on Azure Batch using an Azure Resource Manager (ARM) template available on the DRAGEN Multi-Cloud support site.

The ARM template only includes the parameters required to run DRAGEN on Azure Batch. See the Microsoft Azure documentation available on the Microsoft site for information on configuring additional parameters.

Running DRAGEN using an ARM template enables the following advanced options.

  • Incorporating DRAGEN into an existing infrastructure.

  • Automating deployments with CI/CD pipelines.

  • Customizing the DRAGEN deployment.

Storage Account Parameters

The ARM template available on the DRAGEN Multi-Cloud support site creates a storage account and container. To use an existing Azure Blob storage account, specify the following input parameters in the ARM template.

  • storageNewOrExisting: existing

  • storageAccountName: <name of your existing storage account>

Run DRAGEN Using ARM Template

Use the following instructions to run DRAGEN on Azure Batch using the ARM template available on the DRAGEN Multi-Cloud support site.

  1. Download the ARM template available on the DRAGEN Multi-Cloud support site.

  2. Enter the following commands.

RESOURCE_GROUP="dragen"
az group create -n "$RESOURCE_GROUP" -l "EastUS"
az deployment group create \
-g "$RESOURCE_GROUP" \
-p prefix=fpgaci \
-p azureBatchServiceOid=795cc567-16b1-4904-9344-afc876387199 \
-f mainTemplate.json \
--query "properties.outputs"

You can enter additional command line options to further customize the run, including maximum Batch job and task run time. See the Azure CLI documentation available on the Microsoft site for more information.

Use DRAGEN With the Azure Batch CLI

After creating and authenticating your Azure Batch account, use the following instructions to run DRAGEN with the Azure Batch CLI.

To run a DRAGEN Batch task, create a task.json file. The task.json file contains information on the Batch task, resource files, and output files. See Create the JSON file for information. You can then use the JSON file in the create Batch task command.

For more information on creating Azure Batch accounts and using the Azure Batch CLI, see the Azure Batch documentation available on the Microsoft site.

Create the JSON file

To set up the task.json to use in the Batch task create command, use the following structure.

{
"id": "<Batch task ID. SeeBatch Task>",
"commandLine": "<Command line options>",
"resourcesFiles": [<Resource files. SeeResource Files>],
"outputFiles": [<Output file directory. SeeOutput Files.>]
}

Batch Task

Use the following instructions to configure the Batch task. For more information using the Azure Batch CLI, see the Azure Batch documentation available on the Microsoft site.

  1. Create the batch job using the following command.

az batch job create --id <Unique ID for the job to create> pool-id <Name of the pool created with the ARM template>
  1. Enter the following command line to create the batch task. To use files located in a private Azure Blob storage account, see the Azure Blob CLI documentation available on the Microsoft site.

/bin/bash -c \
"mkdir <Hash table reference directory> <Output directory>; \
tar xzvf dragen.tar -C <Hash table reference directory>; \
/opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true; \
/opt/edico/bin/dragen -f -r <Hash table reference directory> \
-1 <Path to the first local FASTQ file on the node> \
-2 <Path to the second local FASTQ file on the node> \
--RGID <RGID associated with the DRAGEN run> \
--RGSM <RGSM associated with the DRAGEN run> \
--enable-bam-indexing true \
--enable-map-align-output true \
--enable-sort true \
--output-file-prefix dragen-batch \
--enable-map-align true \
--output-format BAM \
--output-directory <Output directory> \
--enable-variant-caller true \
--lic-server <DRAGEN license>"

The following is an example Batch task configuration.

/bin/bash -c \
"mkdir dragen output; \
tar xzvf dragen.tar -C dragen; \
/opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true; \
/opt/edico/bin/dragen -f -r dragen \
-1 1.fq.gz \
-2 2.fq.gz \
--RGID NA24385-AJ-Son-R1-NS_S33 \
--RGSM NA24385-AJ-Son-R1-NS_S33 \
--enable-bam-indexing true \
--enable-map-align-output true \
--enable-sort true \
--output-file-prefix dragen-batch \
--enable-map-align true \
--output-format BAM \
--output-directory output \
--enable-variant-caller true \
--lic-server <LICENSE>"

Resource Files

To add resource files to the Batch node, use the resourceFiles configuration in the task.json. The following example specifies the genome and FASTQ files. To use files located in a private Azure Blob storage account, see the Azure Blob CLI documentation available on the Microsoft site.

"resourceFiles": [{
"filePath": "dragen.tar",
"httpUrl": "<URL to genome tarball file>"
}, {
"filePath": "<First FASTQ file name>.gz",
"httpUrl": "<URL to the first FASTQ file>"
}, {
"filePath": "<Second FASTQ file name>.gz",
"httpUrl": "<URL to the second FASTQ file>"
}]

Output Files

To specify the location to write output files, use the outputFiles configuration in the task.json. The following command-line example places output logs and DRAGEN files in the specified storage container. To generate a storage container URL, use a SAS token. For more information on accessing a Blob storage container using a SAS token, see the Azure Blob CLI documentation available on the Microsoft site.

"outputFiles": [{
"filePattern": "../stdout.txt",
"destination": {
"container": {
"containerUrl": "<Container URL with SAS token>",
"path": "<Batch task ID to organize the output>/stdout.txt"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "../stderr.txt",
"destination": {
"container": {
"containerUrl": "<Container URL with SAS token>",
"path": "<Batch task ID to organize the output>/stderr.txt"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "<Directory to output results to>/**/*",
"destination": {
"container": {
"containerUrl": "<Container URL with SAS token>",
"path": "<Batch task ID to organize the output>/<Directory to output results to>"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "/var/log/dragen.log",
"destination": {
"container": {
"containerUrl": "<Container URL with SAS token>",
"path": "<Batch task ID to organize the output>/log/dragen.log"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "/var/log/dragen/**/*",
"destination": {
"container": {
"containerUrl": "<Container URL with SAS token>",
"path": "<Batch task ID to organize the output>/log/dragen"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}]

Example JSON File

The following is an example task.json file.

{
"id": "task1",
"commandLine": "$COMMAND",
"resourceFiles": [{
"filePath": "dragen.tar",
"httpUrl": "https://dragentestdata.blob.core.windows.net/reference-genomes/Hsapiens/hash-tables/hg38_altaware_nohla-cnv-anchored.v8.tar"
}, {
"filePath": "1.fq.gz",
"httpUrl": "https://dragentestdata.blob.core.windows.net/samples/wes/NA24385-AJ-Son-R1-NS_S33/NA24385-AJ-Son-R1-NS_S33_L001_R1_001.fastq.gz"
}, {
"filePath": "2.fq.gz",
"httpUrl": "https://dragentestdata.blob.core.windows.net/samples/wes/NA24385-AJ-Son-R1-NS_S33/NA24385-AJ-Son-R1-NS_S33_L001_R2_001.fastq.gz"
}],
"outputFiles": [{
"filePattern": "../stdout.txt",
"destination": {
"container": {
"containerUrl": "$CONTAINER_URL",
"path": "task1/stdout.txt"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "../stderr.txt",
"destination": {
"container": {
"containerUrl": "$CONTAINER_URL",
"path": "task1/stderr.txt"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "output/**/*",
"destination": {
"container": {
"containerUrl": "$CONTAINER_URL",
"path": "task1/output"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "/var/log/dragen.log",
"destination": {
"container": {
"containerUrl": "$CONTAINER_URL",
"path": "task1/log/dragen.log"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}, {
"filePattern": "/var/log/dragen/**/*",
"destination": {
"container": {
"containerUrl": "<CONTAINER_URL>",
"path": "task1/log/dragen"
}
},
"uploadOptions": {
"uploadCondition": "taskcompletion"
}
}]
}

Stream Files

DRAGEN can stream input FASTQ and BAM files from private Azure Blob containers. The genome file must be located locally on the node. DRAGEN does not support streaming from public Blob containers.

Stream From Azure Blob Storage

Use the following command as the Batch task command. If using the following command, you do not need to specify resourceFiles in task.json.

/bin/bash -c \
"echo DefaultEndpointsProtocol=https >> ~/.azure-credentials; \
echo AccountName=<Name of the Blob storage account> >> ~/.azure-credentials; \
echo AccountKey=<Access key to the Blob storage account> >> ~/.azure-credentials; \
echo EndpointSuffix=core.windows.net >> ~/.azure-credentials; \
mkdir dragen output; \
tar xzvf dragen.tar -C dragen; \
/opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true; \
/opt/edico/bin/dragen -f -r dragen \
-1 <Full URL to the first FASTQ file in Blob storage> \
-2 <Full URL to the second FASTQ file in Blob storage> \
--RGID <RGID> \
--RGSM <RGSM> \
--enable-bam-indexing true \
--enable-map-align-output true \
--enable-sort true \
--output-file-prefix dragen-batch \
--enable-map-align true \
--output-format BAM \
--output-directory output \
--enable-variant-caller true \
--lic-server <LICENSE>"

Stream From FASTQ List

You can use a FASTQ list file to reference and stream FASTQ files. The FASTQ list file must be local to the node. The FASTQ files referenced in the FASTQ list can be URLs to files on a Blob storage account.

To configure resourceFiles to stream from a FASTQ file list file, use the following command. The FASTQ files in the following command are located on Blob storage account.

LIST_URL=
az storage blob generate-sas \
--name <Blob path to FASTQ list> \
--account-name <Storage account name> \
--account-key <Access key for storage account> \
--container-name <Blob container name> \
--expiry <Date and time the SAS token expires> \
--permissions r \
--https \
--full-uri \
--output tsv

The task.json file is structured as follows.

"resourceFiles": [{
"filePath": "dragen.tar",
"httpUrl": "$GENOME_URL"
}, {
"filePath": "fastq_list.csv",
"httpUrl": "$LIST_URL"
}]

Use the following command as the Batch task command. If using the following command, you do not need to specify resourceFiles in task.json.

/bin/bash -c \
"echo DefaultEndpointsProtocol=https >> ~/.azure-credentials; \
echo AccountName=<Blob storage account name> >> ~/.azure-credentials; \
echo AccountKey=<Access key for storage account> >> ~/.azure-credentials; \
echo EndpointSuffix=core.windows.net >> ~/.azure-credentials; \
mkdir dragen output; \
tar xvf dragen.tar -C dragen; \
/opt/edico/bin/dragen --partial-reconfig HMM --ignore-version-check true; \
/opt/edico/bin/dragen -f -r dragen \
--fastq-list fastq_list.csv \
--fastq-list-sample-id <RGSM> \
--enable-bam-indexing true \
--enable-map-align-output true \
--enable-sort true \
--output-file-prefix dragen-batch \
--enable-map-align true \
--output-format BAM \
--output-directory output \
--enable-variant-caller true \
--lic-server <LICENSE>"

Create the Batch Task

After you have set up the task.json file, you can use this file and Batch job ID to create the Azure Batch task with the following command.

For more information on creating Azure Batch tasks, see the Azure Batch documentation available on the Microsoft site.

az batch task create \
--job-id <ID of the Batch job> \
--json-file task.json

Last updated