Post Processing
A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in LogsIntermediate and Results folders, making it versatile for addressing specific requirements.
This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.
Key Features
Customizability: Easily adaptable to different post-processing requirements.
Reusability: Can be used in multiple pipelines, reducing development effort.
Data transformation: Can be used to transform or modify output data in various ways.
What you need ?
A config file which has Post-Processing parameters and values
A bash script , that implements desired functioanlity
Any other custom resources/files that will be required by the bash script
Docker container having dependencies to run the bash script
Process
Upload and configure Custom Docker
Modify config file; Set postProcessing_container to the uploaded conatiner
Upload all the required files(config, script, reources) to ICA using icav2 client.
Configure ICA Web-UI on 'Start Analysis' Page:
Enable postprocessing, Set it to 'true'
Add 'Custom Parameters Config File', set it to same path as in step-3
Add 'Custom Resources Directory', set it to config file, uploded in step-3
Config File - <file-name>.config
Configurable Parameters in Config file
postProcessing_container
Docker Container URI , Must be present/uploaded to ICA
postProcessing_cpusMemoryConfig
Compute Option to Use, allowed values given below
postProcessing_shellScript
File name of shell-script
Allowed values for postProcessing_cpusMemoryConfig in the config file
single_threaded_low_mem (default)
CPUs: 2, Mem(GB): 8
single_threaded_medium_mem
CPUs: 4, Mem(GB): 16
single_threaded_high_mem
CPUs: 8, Mem(GB): 32
multi_threaded_low_mem
CPUs: 16, Mem(GB): 64
multi_threaded_medium_mem
CPUs: 32, Mem(GB): 128
multi_threaded_high_mem
CPUs: 64, Mem(GB): 128
Post-Processing : Sample Script (bam2cram.sh)
A Post-Processing bash script is a Nextflow Template, which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.
Last updated
Was this helpful?