# Post Processing

A reusable Nextflow component designed for executing various post-processing tasks at the end of pipeline execustion. It can be used to enhance, transform, or modify outputs in Logs\_Intermediates and Results folders, making it versatile for addressing specific requirements.

```
Note - Post-Processing feature is avaialable only for ICA Environment.
```

This component is highly configurable, supporting fine-tuned control of computational resources (CPU, memory), containerization, and output management. Users can integrate custom containers and scripts to implement their own logic for post-processing, all configured through parameters. Externalized process scripts allow for seamless execution of containerized processes.

## Key Features

* Customizability: Easily adaptable to different post-processing requirements.
* Reusability: Can be used in multiple pipelines, reducing development effort.
* Data transformation: Can be used to transform or modify output data in various ways.

## What you need ?

1. A config file which has Post-Processing parameters and values
2. A bash script , that implements desired functioanlity
3. Any other custom resources/files that will be required by the bash script
4. Docker container having dependencies to run the bash script

## Process

1. Upload and configure [Custom Docker](https://help.ica.illumina.com/home/h-dockerrepository#importing-a-private-image-tools--bench-images)
2. Modify config file; Set **postProcessing\_container** to the uploaded conatiner
3. Upload all the required files(config, script, reources) to a project directory, e.g., custom-resources, in ICA using the icav2 client.
4. Configure ICA Web-UI on **'Start Analysis'** Page:
   1. Enable postprocessing, Set it to 'true'
   2. Add 'Custom Parameters Config File', and set it to the filename uploaded to the custom-resource directory above
   3. Add 'Custom Resources Directory', set it to the custom-resource directory above.

## Config File - \<file-name>.config

```sh

postProcessing_container = '079623148045.dkr.ecr.us-east-1.amazonaws.com/cp-prod/0f7f12a0-a6c8-4289-86c3-3e5310b97275:latest'
postProcessing_cpusMemoryConfig = 'single_threaded_low_mem'
postProcessing_shellScript = 'bam2cram.sh'

```

## Configurable Parameters in Config file

| Parameter                        | Description                                            |
| -------------------------------- | ------------------------------------------------------ |
| postProcessing\_container        | Docker Container URI , Must be present/uploaded to ICA |
| postProcessing\_cpusMemoryConfig | Compute Option to Use, allowed values given below      |
| postProcessing\_shellScript      | File name of shell-script                              |

## Allowed values for **postProcessing\_cpusMemoryConfig** in the config file

| Value                                | Description            |
| ------------------------------------ | ---------------------- |
| single\_threaded\_low\_mem (default) | CPUs: 2, Mem(GB): 8    |
| single\_threaded\_medium\_mem        | CPUs: 4, Mem(GB): 16   |
| single\_threaded\_high\_mem          | CPUs: 8, Mem(GB): 32   |
| multi\_threaded\_low\_mem            | CPUs: 16, Mem(GB): 64  |
| multi\_threaded\_medium\_mem         | CPUs: 32, Mem(GB): 128 |
| multi\_threaded\_high\_mem           | CPUs: 64, Mem(GB): 128 |

## Post-Processing : Sample Script (bam2cram.sh)

A Post-Processing bash script is a [Nextflow Template](https://www.nextflow.io/docs/latest/process.html#template), which has access to paths/variables defined in the parent **Nextflow Process**. In our case following directories and subdirectories can accessed from the bash script like {params.analysisDir}/Results , {params.analysisDir}/Logs\_Intermediates. Also, the output files generated should be stored into {params.postProcessing.stepName} directory. Note- For BAM to CRAM Conversion , we must upload genome.fa and .fai files to custom resources direcory.

```sh

#========================================================#
# This is a SAMPLE Script only for illustration purpose  #
# Modify it, according to your specific Use Case         #
#========================================================#

#must create this folder to save output files
mkdir -p "${params.postProcessing.stepName}"

cd "${params.postProcessing.stepName}"

#BAMs are located in 'analysis/results' folder
resultsdir="${params.analysisDir}/Results"
#this file must be uploaded to custom-resources-dir
genomefa="${params.customResourceDir}/genome.fa"

sleep_interval=30 # seconds
max_attempts=3

#set sample ids
sample_ids=("Mariner_1_Feasibility_Biosample_45-smoke" "sample_id_2")

for sample_id in "\${sample_ids[@]}"; do
    counter=0
    while : ; do
        if [ "\$counter" -eq "\$max_attempts" ]; then
            echo "WARNING! \${sample_id}.bam was NOT found!"
            break
        fi
        counter=\$((counter + 1))
        bam_file=\$(find \$resultsdir -type f -name "\${sample_id}.bam")
        if [ -z "\$bam_file" ]; then
            echo "Attempt \$counter : Waiting for \${sample_id}.bam"
            sleep \$sleep_interval
        else
            #process and break
            filename=\$(basename -s .bam \$bam_file)
            samtools view -C -T "\$genomefa" -o "./\$filename.cram" "\$bam_file"
            break
         fi
    done
done

exit 0
```
