Post Processing
A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs_Intermediates and Results folders, making it versatile for addressing specific requirements.
Note - Post-Processing feature is avaialable only for ICA Environment.
This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.
Key Features
Customizability: Easily adaptable to different post-processing requirements.
Reusability: Can be used in multiple pipelines, reducing development effort.
Data transformation: Can be used to transform or modify output data in various ways.
What you need ?
A config file which has Post-Processing parameters and values
A bash script , that implements desired functioanlity
Any other custom resources/files that will be required by the bash script
Docker container having dependencies to run the bash script
Process
Upload and configure Custom Docker
Modify config file; Set postProcessing_container to the uploaded conatiner
Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.
Configure ICA Web-UI on 'Start Analysis' Page:
Enable postprocessing, Set it to 'true'
Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above
Add 'Custom Resources Directory', set it to the custom-resource directory above.
Config File - <file-name>.config
postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
postProcessing_shellScript = 'bam2cram.sh'
Configurable Parameters in Config file
postProcessing_container
Docker Container URI , Must be present/uploaded to ICA
postProcessing_cpusMemoryConfig
Compute Option to Use, allowed values given below
postProcessing_shellScript
File name of shell-script
Allowed values for postProcessing_cpusMemoryConfig in the config file
single_threaded_low_mem (default)
CPUs: 2, Mem(GB): 8
single_threaded_medium_mem
CPUs: 4, Mem(GB): 16
single_threaded_high_mem
CPUs: 8, Mem(GB): 32
multi_threaded_low_mem
CPUs: 16, Mem(GB): 64
multi_threaded_medium_mem
CPUs: 32, Mem(GB): 128
multi_threaded_high_mem
CPUs: 64, Mem(GB): 128
Post-Processing : Sample Script (bam2cram.sh)
A Post-Processing bash script is a Nextflow Template, which has access to paths/variables defined in the parent Nextflow Process. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.
#========================================================#
# This is a SAMPLE Script only for illustration purpose #
# Modify it, according to your specific Use Case #
#========================================================#
#must create this folder to save output files
mkdir -p "${params.postProcessing.stepName}"
cd "${params.postProcessing.stepName}"
#BAMs are located in 'analysis/results' folder
resultsdir="${params.analysisDir}/Results"
#this file must be uploaded to custom-resources-dir
genomefa="${params.customResourceDir}/genome.fa"
sleep_interval=30 # seconds
max_attempts=3
#set sample ids
sample_ids=("Mariner_1_Feasibility_Biosample_45-smoke" "sample_id_2")
for sample_id in "\${sample_ids[@]}"; do
counter=0
while : ; do
if [ "\$counter" -eq "\$max_attempts" ]; then
echo "WARNING! \${sample_id}.bam was NOT found!"
break
fi
counter=\$((counter + 1))
bam_file=\$(find \$resultsdir -type f -name "\${sample_id}.bam")
if [ -z "\$bam_file" ]; then
echo "Attempt \$counter : Waiting for \${sample_id}.bam"
sleep \$sleep_interval
else
#process and break
filename=\$(basename -s .bam \$bam_file)
samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$bam_file"
break
fi
done
done
exit 0
Last updated
Was this helpful?