Post Processing

A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in LogsIntermediate and Results folders, making it versatile for addressing specific requirements.

Note - Post-Processing feature is avaialable only for ICA Environment.

This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.

Key Features

  • Customizability: Easily adaptable to different post-processing requirements.

  • Reusability: Can be used in multiple pipelines, reducing development effort.

  • Data transformation: Can be used to transform or modify output data in various ways.

What you need ?

  1. A config file which has Post-Processing parameters and values

  2. A bash script , that implements desired functioanlity

  3. Any other custom resources/files that will be required by the bash script

  4. Docker container having dependencies to run the bash script

Process

  1. Upload and configure Custom Docker

  2. Modify config file; Set postProcessing_container to the uploaded conatiner

  3. Upload all the required files(config, script, reources) to ICA using icav2 client.

  4. Configure ICA Web-UI on 'Start Analysis' Page:

    1. Enable postprocessing, Set it to 'true'

    2. Add 'Custom Parameters Config File', set it to same path as in step-3

    3. Add 'Custom Resources Directory', set it to config file, uploded in step-3

Config File - <file-name>.config

postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
postProcessing_shellScript = 'bam2cram.sh'

Configurable Parameters in Config file

Parameter
Description

postProcessing_container

Docker Container URI , Must be present/uploaded to ICA

postProcessing_cpusMemoryConfig

Compute Option to Use, allowed values given below

postProcessing_shellScript

File name of shell-script

Allowed values for postProcessing_cpusMemoryConfig in the config file

Value
Description

single_threaded_low_mem (default)

CPUs: 2, Mem(GB): 8

single_threaded_medium_mem

CPUs: 4, Mem(GB): 16

single_threaded_high_mem

CPUs: 8, Mem(GB): 32

multi_threaded_low_mem

CPUs: 16, Mem(GB): 64

multi_threaded_medium_mem

CPUs: 32, Mem(GB): 128

multi_threaded_high_mem

CPUs: 64, Mem(GB): 128

Post-Processing : Sample Script (bam2cram.sh)

A Post-Processing bash script is a Nextflow Template, which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.


mkdir -p "${params.postProcessing.stepName}"

cd "${params.postProcessing.stepName}"

resultsdir="${params.analysisDir}/Results"
genomefa="${params.customResourceDir}/genome.fa"

bamfiles=\$(find \$resultsdir -type f -name '*.bam' )
if [ -z "\$bamfiles" ];
    then
        echo "WARNING: BAM files NOT found !"
        exit 0
fi

for f in \$bamfiles
do
    filename=\$(basename -s .bam \$f)

    samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$f"

done

exit 0

Last updated

Was this helpful?