Post Processing

A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs_Intermediates and Results folders, making it versatile for addressing specific requirements.

Note - Post-Processing feature is avaialable only for ICA Environment.

This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.

Key Features

Customizability: Easily adaptable to different post-processing requirements.
Reusability: Can be used in multiple pipelines, reducing development effort.
Data transformation: Can be used to transform or modify output data in various ways.

What you need ?

A config file which has Post-Processing parameters and values
A bash script , that implements desired functioanlity
Any other custom resources/files that will be required by the bash script
Docker container having dependencies to run the bash script

Process

Upload and configure Custom Docker
Modify config file; Set postProcessing_container to the uploaded conatiner
Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.
Configure ICA Web-UI on 'Start Analysis' Page:
1. Enable postprocessing, Set it to 'true'
2. Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above
3. Add 'Custom Resources Directory', set it to the custom-resource directory above.

Config File - <file-name>.config


postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
postProcessing_shellScript = 'bam2cram.sh'

Configurable Parameters in Config file

Parameter

Description

postProcessing_container

Docker Container URI , Must be present/uploaded to ICA

postProcessing_cpusMemoryConfig

Compute Option to Use, allowed values given below

postProcessing_shellScript

File name of shell-script

Allowed values for postProcessing_cpusMemoryConfig in the config file

Value

Description

single_threaded_low_mem (default)

CPUs: 2, Mem(GB): 8

single_threaded_medium_mem

CPUs: 4, Mem(GB): 16

single_threaded_high_mem

CPUs: 8, Mem(GB): 32

multi_threaded_low_mem

CPUs: 16, Mem(GB): 64

multi_threaded_medium_mem

CPUs: 32, Mem(GB): 128

multi_threaded_high_mem

CPUs: 64, Mem(GB): 128

Post-Processing : Sample Script (bam2cram.sh)

A Post-Processing bash script is a Nextflow Template, which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.


#========================================================#
# This is a SAMPLE Script only for illustration purpose  #
# Modify it, according to your specific Use Case         #
#========================================================#

#must create this folder to save output files
mkdir -p "${params.postProcessing.stepName}"

cd "${params.postProcessing.stepName}"

#BAMs are located in 'analysis/results' folder
resultsdir="${params.analysisDir}/Results"
#this file must be uploaded to custom-resources-dir
genomefa="${params.customResourceDir}/genome.fa"

sleep_interval=30 # seconds
max_attempts=3

#set sample ids
sample_ids=("Mariner_1_Feasibility_Biosample_45-smoke" "sample_id_2")

for sample_id in "\${sample_ids[@]}"; do
    counter=0
    while : ; do
        if [ "\$counter" -eq "\$max_attempts" ]; then
            echo "WARNING! \${sample_id}.bam was NOT found!"
            break
        fi
        counter=\$((counter + 1))
        bam_file=\$(find \$resultsdir -type f -name "\${sample_id}.bam")
        if [ -z "\$bam_file" ]; then
            echo "Attempt \$counter : Waiting for \${sample_id}.bam"
            sleep \$sleep_interval
        else
            #process and break
            filename=\$(basename -s .bam \$bam_file)
            samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$bam_file"
            break
         fi
    done
done

exit 0

PreviousCustom Config Support NextAnalysis Output

Last updated 2 months ago

Was this helpful?