Post Processing

A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs_Intermediates and Results folders, making it versatile for addressing specific requirements.

Note - Post-Processing feature is avaialable only for ICA Environment.

This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.

Key Features

  • Customizability: Easily adaptable to different post-processing requirements.

  • Reusability: Can be used in multiple pipelines, reducing development effort.

  • Data transformation: Can be used to transform or modify output data in various ways.

What you need ?

  1. A config file which has Post-Processing parameters and values

  2. A bash script , that implements desired functioanlity

  3. Any other custom resources/files that will be required by the bash script

  4. Docker container having dependencies to run the bash script

Process

  1. Upload and configure Custom Docker

  2. Modify config file; Set postProcessing_container to the uploaded conatiner

  3. Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.

  4. Configure ICA Web-UI on 'Start Analysis' Page:

    1. Enable postprocessing, Set it to 'true'

    2. Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above

    3. Add 'Custom Resources Directory', set it to the custom-resource directory above.

Config File - <file-name>.config

Configurable Parameters in Config file

Parameter
Description

postProcessing_container

Docker Container URI , Must be present/uploaded to ICA

postProcessing_cpusMemoryConfig

Compute Option to Use, allowed values given below

postProcessing_shellScript

File name of shell-script

Allowed values for postProcessing_cpusMemoryConfig in the config file

Value
Description

single_threaded_low_mem (default)

CPUs: 2, Mem(GB): 8

single_threaded_medium_mem

CPUs: 4, Mem(GB): 16

single_threaded_high_mem

CPUs: 8, Mem(GB): 32

multi_threaded_low_mem

CPUs: 16, Mem(GB): 64

multi_threaded_medium_mem

CPUs: 32, Mem(GB): 128

multi_threaded_high_mem

CPUs: 64, Mem(GB): 128

Post-Processing : Sample Script (bam2cram.sh)

A Post-Processing bash script is a Nextflow Template, which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.

Last updated

Was this helpful?