diff --git a/post_processing/TPMCalculator/README.md b/post_processing/TPMCalculator/README.md
new file mode 100644
index 0000000..72fb754
--- /dev/null
+++ b/post_processing/TPMCalculator/README.md
@@ -0,0 +1,146 @@
+
+# mRNA Quantification using TPMCalculator
+
+<a id="introduction"></a>
+## Introduction
+
+This pipeline quantifies mRNA abundance directly from BAM file alignments using **TPMCalculator** and generates visualizations to assess the results. It is designed to process multiple samples and combine results into comprehensive matrices for downstream analysis. 
+
+The pipeline is organized into four scripts, each responsible for a different stage of processing and analysis. These scripts require access to a **SLURM** job manager for submitting and running the workflow.
+
+## Table of Contents
+* [Introduction](#introduction)
+* [Installation](#installation)
+  * [Environment](#environment)
+* [TPMCalculator](#tpmcalc)
+* [GFF3 to GTF](#gff3_to_gtf)
+* [Pipeline Overview](#pipeline_overview)
+* [Recommended Directory Structure](#dir_struc)
+
+<a id="installation"></a>
+## Installation
+This repository can be cloned locally by running the following `git` command:
+```bash
+git clone https://github.com/baliga-lab/Global_Search.git
+```
+Please note that Git is required to run the above command. For instructions on downloading Git, please see [the Git Guide](https://github.com/git-guides/install-git).
+
+<a id="environment"></a>
+### Environment
+
+#### Conda
+This application is built on top of multiple Python packages with specific version requirements. We recommend using `conda` to create an isolated Python environment with all necessary packages. If you do not have it installed, follow the [Miniconda Instillation Instructions](https://www.anaconda.com/docs/getting-started/miniconda/install). The list of necessary packages can be found at in the [`environment.yml`](./environment.yml) file.
+
+To create the specified `gs-coral-env` Conda environment, run the following command:
+```bash
+conda env create -f environment.yml
+```
+
+Once the Conda environment is created, it can be activated by:
+```bash
+conda activate gs-coral-env
+```
+After coding inside the environment, it can be deactivated with the command:
+```bash
+conda deactivate
+```
+
+**Please note! This environment file does not contain the packages required to run the whole Global_Search pipeline. The packages included are only enough to run the scripts in the TPMCalculator section**. Installation instructions for the Gloabl_Search pipleine can be found in the main [README](../../README.md)
+
+<a id="tpmcalc"></a>
+## TPMCalculator
+
+TPMCalculator requires two inputs:  
+- A **GTF annotation file** describing gene models.  
+- **BAM alignment files** for each sample.  
+
+It produces four output files per sample containing TPM values and raw read counts for:  
+- **Genes**  
+- **Transcripts**  
+- **Exons**  
+- **Introns**
+
+<a id="gff3_to_gtf"></a>
+## GFF3 to GTF
+
+While there are several ways to convert a GFF file to a GTF file, a custom script was used to facilitate this conversion. The [`gff3_to_gtf.py`](./scripts/gff3_to_gtf.py) script...
+- Reads a GFF3 file
+- Converts `gene`, `mRNA/transcript`, `exon`, and `CDS` features into GTF format
+- Constructs valid `gene_id` and `transcript_id` fields
+- Writes a new GTF file
+
+Run the following command to convert a GFF3 file to a GTF file:
+```bash
+python3 gff3_to_gtf.py <input_gff3> <output_gtf>
+```
+
+<a id="pipeline_overview"></a>
+## Pipeline Overview
+
+The scripts should be run in the following order:
+
+0. **gff3_to_gtf.py**
+   - See instructions above. 
+
+1. **check_bam_sorted.py**  
+   - **Purpose:** Verify that all BAM files are correctly sorted and log their sort order.  
+   - **Inputs:**  
+     - `base_dir`: Path to the folder containing BAM files. BAM files can be nested within sample directories.  
+   - **Outputs:**  
+     - TSV file logging the sort order of each BAM file (`bam_sort_status_<sample>.tsv`).  
+
+2. **generate_tpmcalculator_results.py**  
+   - **Purpose:** Generate TPMCalculator results for each sample and create SLURM job scripts to run the analyses.  
+   - **Inputs:**  
+     - `base_dir`: Path to the folder containing BAM files. Bam files may be nested.
+     - `gtf_file`: Path to the reference GTF annotation file.
+     - `tpmcalculator_dir`: Desired output directory for TPMCalculator results.  
+   - **Outputs:**  
+     - TPMCalculator output files for each sample (genes, transcripts, exons, introns).  
+     - SLURM job scripts for each sample (`*_tpmcalculator_job.sh`).  
+
+3. **combine_tpmcalculator_results.py**  
+   - **Purpose:** Combine TPMCalculator results from all samples into unified TPM and reads matrices.  
+   - **Inputs:**  
+     - `base_dir`: Folder containing all TPMCalculator outputs for all samples. Output files may be nested.
+     - `output_dir`: Desired output directory for combined matrices.  
+   - **Outputs:**  
+     - TPM matrix (`STAR_TPMcalculator_<project>_TPMs_Matrix_Merged.csv`).  
+     - Reads matrix (`STAR_TPMcalculator_<project>_Reads_Matrix_Merged.csv`).  
+
+4. **analyze_organism_totals.py**  
+   - **Purpose:** Summarize total TPMs and reads by organism (e.g., Host, Symbiont species) and generate visualizations to verify results.  
+   - **Inputs:**  
+     - `base_dir`: Folder containing combined TPMCalculator results. This is the output directory.  
+     - `tpm_file`: Path to the merged TPM matrix.  
+     - `reads_file`: Path to the merged reads matrix. 
+   - **Outputs:**  
+     - Summary CSV files of TPM and reads totals by organism (`TPM_totals_by_organism.csv`, `Reads_totals_by_organism.csv`).  
+     - Barplots and summary figures (`organism_totals_barplots.png/pdf`, `organism_summary_stats.png/pdf`).  
+
+<a id="dir_struc"></a>
+## Recommended Directory Structure
+
+For clarity and organization, the pipeline outputs are recommended to be structured as follows:
+```
+├── sample_organism_name/
+│   ├── combined_results/
+│   │   ├── organism_analysis/
+│   │   │   ├── <summary visualization files>
+│   │   │   ├── ...
+│   │   ├── STAR_TPMCalculator_<sample_organism_name>_Reads_Matrix_Merged.csv
+│   │   ├── STAR_TPMCalculator_<sample_organism_name>_TPMs_Matrix_Merged.csv
+│   ├── tpmcalculator_output/
+│   │   ├── sample_1/
+│   │   │   ├── <TPMCalculator output files>
+│   │   │   ├── ...
+│   │   ├── sample_2/
+│   │   │   ├── <TPMCalculator output files>
+│   │   │   ├── ...
+│   │   ├── sample_.../
+│   │   │   ├── <TPMCalculator output files>
+│   │   │   ├── ...
+│   ├── reference/
+│   │   ├── <sample_organism_name>.gff3 or gff
+│   │   ├── <sample_organism_name>.gtf
+```
diff --git a/post_processing/TPMCalculator/environment.yml b/post_processing/TPMCalculator/environment.yml
new file mode 100644
index 0000000..b3948ef
--- /dev/null
+++ b/post_processing/TPMCalculator/environment.yml
@@ -0,0 +1,217 @@
+name: gs-coral-env
+channels:
+  - bioconda
+  - conda-forge
+  - https://repo.anaconda.com/pkgs/main
+  - https://repo.anaconda.com/pkgs/r
+dependencies:
+  - _libgcc_mutex=0.1=conda_forge
+  - _openmp_mutex=4.5=2_gnu
+  - _r-mutex=1.0.1=anacondar_1
+  - agat=1.5.1=pl5321hdfd78af_0
+  - bamtools=2.5.3=he132191_0
+  - binutils_impl_linux-64=2.45=default_hfdba357_104
+  - bwidget=1.10.1=ha770c72_1
+  - bzip2=1.0.8=hda65f42_8
+  - c-ares=1.34.5=hb9d3cd8_0
+  - ca-certificates=2025.11.12=hbd8a1cb_0
+  - cairo=1.18.4=h3394656_0
+  - curl=8.16.0=h4e3cde8_0
+  - font-ttf-dejavu-sans-mono=2.37=hab24e00_0
+  - font-ttf-inconsolata=3.000=h77eed37_0
+  - font-ttf-source-code-pro=2.038=h77eed37_0
+  - font-ttf-ubuntu=0.83=h77eed37_3
+  - fontconfig=2.15.0=h7e30c49_1
+  - fonts-conda-ecosystem=1=0
+  - fonts-conda-forge=1=hc364b38_1
+  - freetype=2.14.1=ha770c72_0
+  - fribidi=1.0.16=hb03c661_0
+  - gcc_impl_linux-64=15.2.0=h6f0f26c_15
+  - gffread=0.12.7=h077b44d_6
+  - gfortran_impl_linux-64=15.2.0=h281d09f_15
+  - graphite2=1.3.14=hecca717_2
+  - gsl=2.7=he838d99_0
+  - gxx_impl_linux-64=15.2.0=hda75c37_15
+  - harfbuzz=12.2.0=h15599e2_0
+  - htslib=1.22.1=h566b1c6_0
+  - icu=75.1=he02047a_0
+  - jsoncpp=1.9.6=hf42df4d_1
+  - kernel-headers_linux-64=6.12.0=he073ed8_3
+  - keyutils=1.6.3=hb9d3cd8_0
+  - krb5=1.21.3=h659f571_0
+  - ld_impl_linux-64=2.45=default_hbd61a6d_104
+  - lerc=4.0.0=h0aef613_1
+  - libblas=3.11.0=4_h4a7cf45_openblas
+  - libcblas=3.11.0=4_h0358290_openblas
+  - libcurl=8.16.0=h4e3cde8_0
+  - libdb=6.2.32=h9c3ff4c_0
+  - libdeflate=1.24=h86f0d12_0
+  - libedit=3.1.20250104=pl5321h7949ede_0
+  - libev=4.33=hd590300_2
+  - libexpat=2.7.3=hecca717_0
+  - libffi=3.5.2=h9ec8514_0
+  - libfreetype=2.14.1=ha770c72_0
+  - libfreetype6=2.14.1=h73754d4_0
+  - libgcc=15.2.0=h767d61c_7
+  - libgcc-devel_linux-64=15.2.0=hcc6f6b0_115
+  - libgcc-ng=15.2.0=h69a702a_7
+  - libgfortran=15.2.0=h69a702a_15
+  - libgfortran-ng=15.2.0=h69a702a_15
+  - libgfortran5=15.2.0=h68bc16d_15
+  - libglib=2.86.2=h32235b2_0
+  - libgomp=15.2.0=h767d61c_7
+  - libiconv=1.18=h3b78370_2
+  - libjpeg-turbo=3.1.2=hb03c661_0
+  - liblapack=3.11.0=4_h47877c9_openblas
+  - liblzma=5.8.1=hb9d3cd8_2
+  - libnghttp2=1.67.0=had1ee68_0
+  - libopenblas=0.3.30=pthreads_h94d23a6_4
+  - libpng=1.6.53=h421ea60_0
+  - libsanitizer=15.2.0=h90f66d4_15
+  - libssh2=1.11.1=hcf80075_0
+  - libstdcxx=15.2.0=h8f9b012_7
+  - libstdcxx-devel_linux-64=15.2.0=hd446a21_115
+  - libstdcxx-ng=15.2.0=h4852527_7
+  - libtiff=4.7.1=h8261f1e_0
+  - libuuid=2.41.2=h5347b49_1
+  - libwebp-base=1.6.0=hd42ef1d_0
+  - libxcb=1.17.0=h8a09558_0
+  - libxcrypt=4.4.36=hd590300_1
+  - libzlib=1.3.1=hb9d3cd8_2
+  - make=4.4.1=hb9d3cd8_2
+  - ncurses=6.5=h2d0b736_3
+  - openssl=3.6.0=h26f9b46_0
+  - pango=1.56.4=hadf4263_0
+  - pcre2=10.46=h1321c63_0
+  - perl=5.32.1=7_hd590300_perl5
+  - perl-b-cow=0.007=pl5321hb9d3cd8_1
+  - perl-b-hooks-endofscope=0.28=pl5321ha770c72_1
+  - perl-base=2.23=pl5321hd8ed1ab_0
+  - perl-bioperl-core=1.7.8=pl5321hdfd78af_1
+  - perl-business-isbn=3.007=pl5321hd8ed1ab_0
+  - perl-business-isbn-data=20210112.006=pl5321hd8ed1ab_0
+  - perl-carp=1.50=pl5321hd8ed1ab_0
+  - perl-class-inspector=1.36=pl5321hdfd78af_0
+  - perl-class-load=0.25=pl5321ha770c72_2
+  - perl-class-load-xs=0.10=pl5321hb9d3cd8_0
+  - perl-class-methodmaker=2.25=pl5321h7b50bb2_1
+  - perl-clone=0.46=pl5321hb9d3cd8_1
+  - perl-compress-raw-bzip2=2.214=pl5321hda65f42_0
+  - perl-compress-raw-zlib=2.214=pl5321h4dac143_0
+  - perl-constant=1.33=pl5321hd8ed1ab_0
+  - perl-cpan-meta-check=0.014=pl5321ha770c72_0
+  - perl-data-dump=1.25=pl5321h7b50bb2_2
+  - perl-data-optlist=0.114=pl5321ha770c72_1
+  - perl-db_file=1.858=pl5321hb9d3cd8_0
+  - perl-devel-globaldestruction=0.14=pl5321ha770c72_0
+  - perl-devel-overloadinfo=0.007=pl5321ha770c72_1
+  - perl-devel-stacktrace=2.04=pl5321h296ab09_0
+  - perl-digest-hmac=1.05=pl5321hdfd78af_0
+  - perl-digest-md5=2.59=pl5321hb9d3cd8_3
+  - perl-dist-checkconflicts=0.11=pl5321ha770c72_0
+  - perl-encode=3.21=pl5321hb9d3cd8_1
+  - perl-encode-locale=1.05=pl5321hdfd78af_7
+  - perl-eval-closure=0.14=pl5321ha770c72_0
+  - perl-exporter=5.74=pl5321hd8ed1ab_0
+  - perl-exporter-tiny=1.002002=pl5321hd8ed1ab_0
+  - perl-extutils-makemaker=7.70=pl5321hd8ed1ab_0
+  - perl-file-listing=6.16=pl5321hdfd78af_0
+  - perl-file-pushd=1.016=pl5321ha770c72_0
+  - perl-file-share=0.25=pl5321hdfd78af_3
+  - perl-file-sharedir=1.118=pl5321hdfd78af_0
+  - perl-file-sharedir-install=0.14=pl5321hdfd78af_0
+  - perl-file-spec=3.48_01=pl5321hdfd78af_2
+  - perl-getopt-long=2.58=pl5321hdfd78af_0
+  - perl-graph=0.9735=pl5321hdfd78af_0
+  - perl-heap=0.80=pl5321hdfd78af_1
+  - perl-html-parser=3.81=pl5321h4ac6f70_1
+  - perl-html-tagset=3.24=pl5321hdfd78af_0
+  - perl-http-cookiejar-lwp=0.014=pl5321hdfd78af_0
+  - perl-http-cookies=6.11=pl5321hdfd78af_0
+  - perl-http-daemon=6.16=pl5321hdfd78af_0
+  - perl-http-date=6.06=pl5321hdfd78af_0
+  - perl-http-message=7.01=pl5321hdfd78af_0
+  - perl-http-negotiate=6.01=pl5321hdfd78af_4
+  - perl-inc-latest=0.500=pl5321ha770c72_0
+  - perl-io-html=1.004=pl5321hdfd78af_0
+  - perl-io-socket-ssl=2.075=pl5321hd8ed1ab_0
+  - perl-io-tty=1.20=pl5321hb9d3cd8_3
+  - perl-ipc-run=20250809.0=pl5321hdfd78af_0
+  - perl-libwww-perl=6.81=pl5321hdfd78af_0
+  - perl-list-moreutils=0.430=pl5321hdfd78af_0
+  - perl-list-moreutils-xs=0.430=pl5321h7b50bb2_5
+  - perl-lwp-mediatypes=6.04=pl5321hdfd78af_1
+  - perl-lwp-protocol-https=6.14=pl5321hdfd78af_1
+  - perl-mime-base64=3.16=pl5321hb9d3cd8_3
+  - perl-module-build=0.4234=pl5321ha770c72_1
+  - perl-module-implementation=0.09=pl5321ha770c72_1
+  - perl-module-load=0.34=pl5321hdfd78af_0
+  - perl-module-runtime=0.016=pl5321ha770c72_1
+  - perl-module-runtime-conflicts=0.003=pl5321ha770c72_0
+  - perl-moose=2.2207=pl5321hb9d3cd8_2
+  - perl-mozilla-ca=20250602=pl5321hdfd78af_0
+  - perl-mro-compat=0.15=pl5321ha770c72_0
+  - perl-namespace-clean=0.27=pl5321h296ab09_1
+  - perl-net-http=6.24=pl5321hdfd78af_0
+  - perl-net-ssleay=1.94=pl5321hf672d98_1
+  - perl-ntlm=1.09=pl5321hdfd78af_5
+  - perl-package-deprecationmanager=0.18=pl5321ha770c72_2
+  - perl-package-stash=0.40=pl5321ha770c72_2
+  - perl-package-stash-xs=0.30=pl5321hb9d3cd8_2
+  - perl-params-util=1.102=pl5321hb9d3cd8_1
+  - perl-parent=0.243=pl5321hd8ed1ab_0
+  - perl-pod-escapes=1.07=pl5321hdfd78af_2
+  - perl-regexp-common=2017060201=pl5321hd8ed1ab_0
+  - perl-safe=2.37=pl5321hdfd78af_2
+  - perl-scalar-list-utils=1.70=pl5321hb03c661_0
+  - perl-set-object=1.43=pl5321h7b50bb2_0
+  - perl-socket=2.027=pl5321h5c03b87_6
+  - perl-sort-naturally=1.03=pl5321hdfd78af_3
+  - perl-statistics-r=0.34=pl5321r44hdfd78af_7
+  - perl-storable=3.15=pl5321hb9d3cd8_2
+  - perl-sub-exporter=0.991=pl5321ha770c72_1
+  - perl-sub-exporter-progressive=0.001013=pl5321ha770c72_0
+  - perl-sub-identify=0.14=pl5321hb9d3cd8_2
+  - perl-sub-install=0.928=pl5321hd8ed1ab_0
+  - perl-sub-name=0.28=pl5321hb9d3cd8_1
+  - perl-term-progressbar=2.23=pl5321hdfd78af_0
+  - perl-test=1.26=pl5321hd8ed1ab_0
+  - perl-test-cleannamespaces=0.24=pl5321ha770c72_2
+  - perl-test-fatal=0.016=pl5321ha770c72_0
+  - perl-test-harness=3.44=pl5321hd8ed1ab_0
+  - perl-test-leaktrace=0.17=pl5321h7bf1ee8_1
+  - perl-test-needs=0.002009=pl5321hd8ed1ab_0
+  - perl-test-requiresinternet=0.05=pl5321hdfd78af_1
+  - perl-test-warnings=0.031=pl5321ha770c72_0
+  - perl-text-balanced=2.07=pl5321hdfd78af_0
+  - perl-text-wrap=2021.0814=pl5321hd8ed1ab_0
+  - perl-time-local=1.35=pl5321hdfd78af_0
+  - perl-timedate=2.33=pl5321hdfd78af_2
+  - perl-try-tiny=0.31=pl5321ha770c72_0
+  - perl-uri=5.34=pl5321ha770c72_0
+  - perl-url-encode=0.03=pl5321h9ee0642_1
+  - perl-variable-magic=0.64=pl5321hb9d3cd8_0
+  - perl-vars=1.03=pl5321hdfd78af_2
+  - perl-www-robotrules=6.02=pl5321hdfd78af_4
+  - perl-yaml=1.30=pl5321hdfd78af_0
+  - pixman=0.46.4=h54a6638_1
+  - pthread-stubs=0.4=hb9d3cd8_1002
+  - r-base=4.4.3=h14df4e6_4
+  - readline=8.2=h8c095d6_2
+  - samtools=1.22.1=h96c455f_0
+  - sed=4.9=h6688a6e_0
+  - sysroot_linux-64=2.39=hc4b9eeb_3
+  - tk=8.6.13=noxft_ha0e22de_103
+  - tktable=2.10=h8d826fa_7
+  - tpmcalculator=0.0.5=h2bd4fab_3
+  - tzdata=2025b=h78e105d_0
+  - xorg-libice=1.1.2=hb9d3cd8_0
+  - xorg-libsm=1.2.6=he73a12e_0
+  - xorg-libx11=1.8.12=h4f16b4b_0
+  - xorg-libxau=1.0.12=hb03c661_1
+  - xorg-libxdmcp=1.1.5=hb03c661_1
+  - xorg-libxext=1.3.6=hb9d3cd8_0
+  - xorg-libxrender=0.9.12=hb9d3cd8_0
+  - xorg-libxt=1.3.1=hb9d3cd8_0
+  - zlib=1.3.1=hb9d3cd8_2
+  - zstd=1.5.7=hb8e6e7a_2
diff --git a/post_processing/TPMCalculator/scripts/analyze_organism_totals.py b/post_processing/TPMCalculator/scripts/analyze_organism_totals.py
new file mode 100644
index 0000000..a0a3948
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/analyze_organism_totals.py
@@ -0,0 +1,340 @@
+#!/usr/bin/env python3
+"""
+analyze_organism_totals.py
+
+This script takes merged TPM and Reads matrices produced by the previous 
+TPMCalculator combination step and verifes that expression values were
+assinged correctly to each organism for the multi-organism dataset. 
+
+It also creates summary dataframes and barplots to verify processing worked correctly,
+and generates the following outputs:
+    - TPM_totals_by_organism.csv
+    - Reads_totals_by_organism.csv
+    - organism_totals_barplots.png/.pdf
+    - organism_summary_stats.png/.pdf
+
+Functions:
+    - analyze_organism_totals(tpm_file, reads_file, output_dir, organisms)
+    - create_barplots(tpm_totals, reads_totals, output_dir, organisms)
+    - create_summary_stats_plot(tpm_totals, reads_totals, output_dir, organisms)
+    - main()
+
+Usage:
+    python3 analyze_organism_totals.py
+"""
+
+import os
+import pandas as pd
+import matplotlib.pyplot as plt
+import numpy as np
+
+def analyze_organism_totals(tpm_file, reads_file, output_dir, organisms):
+    """
+    Analyze total TPMs and reads by organism prefix.
+
+    Args:
+        - tpm_file (str): Path to TPM matrix CSV file
+        - reads_file (str): Path to reads matrix CSV file
+        - output_dir (str): Directory to save outputs
+        - organisms (dict): Dictionary containing the names and desired colors used
+            for graph labeling
+
+    Returns:
+        - tpm_totals (pandas.DataFrame): DataFrame with TPM totals by organism
+        - reads_totals (pandas.DataFrame): DataFrame with reads totals by organism
+    """
+    print("Loading TPM and reads matrices...")
+
+    # Load matrices
+    tpm_df = pd.read_csv(tpm_file)
+    reads_df = pd.read_csv(reads_file)
+
+    print(f"TPM matrix shape: {tpm_df.shape}")
+    print(f"Reads matrix shape: {reads_df.shape}")
+
+    # Get sample columns (all except Gene_Id)
+    sample_columns = [col for col in tpm_df.columns if col != 'Gene_Id']
+    print(f"Found {len(sample_columns)} samples")
+
+    # Define organism prefixes
+    organism_prefixes = organisms["names"]
+
+    # Initialize result dataframes
+    tpm_totals = pd.DataFrame(index=sample_columns, columns=organism_prefixes)
+    reads_totals = pd.DataFrame(index=sample_columns, columns=organism_prefixes)
+
+    print("Calculating totals by organism...")
+
+    # Calculate totals for each organism and sample
+    for sample in sample_columns:
+        for prefix in organism_prefixes:
+            # Filter genes by prefix
+            gene_mask = tpm_df['Gene_Id'].str.startswith(prefix)
+
+            # Calculate totals
+            tpm_total = tpm_df.loc[gene_mask, sample].sum()
+            reads_total = reads_df.loc[gene_mask, sample].sum()
+
+            tpm_totals.loc[sample, prefix] = tpm_total
+            reads_totals.loc[sample, prefix] = reads_total
+
+    # Convert to numeric
+    tpm_totals = tpm_totals.astype(float)
+    reads_totals = reads_totals.astype(float)
+
+    # Add total columns
+    tpm_totals['Total'] = tpm_totals.sum(axis=1)
+    reads_totals['Total'] = reads_totals.sum(axis=1)
+
+    print("Summary statistics:")
+    print("\nTPM totals:")
+    print(tpm_totals.describe())
+    print("\nReads totals:")
+    print(reads_totals.describe())
+
+    # Save summary dataframes
+    tpm_output = os.path.join(output_dir, "TPM_totals_by_organism.csv")
+    reads_output = os.path.join(output_dir, "Reads_totals_by_organism.csv")
+
+    tpm_totals.to_csv(tpm_output)
+    reads_totals.to_csv(reads_output)
+
+    print(f"\nSaved summary files:")
+    print(f"TPM totals: {tpm_output}")
+    print(f"Reads totals: {reads_output}")
+
+    # Create visualizations
+    create_barplots(tpm_totals, reads_totals, output_dir, organisms)
+
+    return tpm_totals, reads_totals
+
+def create_barplots(tpm_totals, reads_totals, output_dir, organisms):
+    """
+    Create barplots for TPM and reads totals by organism.
+
+    Args:
+        - tpm_totals (pandas.DataFrame): DataFrame with TPM totals by organism
+        - reads_totals (pandas.DataFrame): DataFrame with reads totals by organism
+        - output_dir (str): Directory to save plots
+        - organisms (dict): Dictionary containing the names and desired colors used
+            for graph labeling
+
+    Returns:
+        - N/A
+    """
+    print("Creating visualizations...")
+
+    # Set up the plotting style
+    plt.style.use('default')
+    colors = organisms["colors"]
+
+    # Remove Total column for plotting
+    organism_cols = organisms["names"]
+
+    # Create figure with subplots
+    fig, axes = plt.subplots(2, 2, figsize=(15, 12))
+    fig.suptitle('TPM and Reads Distribution by Organism', fontsize=16, fontweight='bold')
+
+    # Plot 1: TPM totals by organism (stacked bar)
+    ax1 = axes[0, 0]
+    tpm_plot_data = tpm_totals[organism_cols]
+    tpm_plot_data.plot(kind='bar', stacked=True, ax=ax1, width=0.8, color=colors)
+    ax1.set_title('TPM Totals by Organism (Stacked)', fontweight='bold')
+    ax1.set_xlabel('Sample Index')
+    ax1.set_ylabel('TPM')
+    ax1.legend(title='Organism', bbox_to_anchor=(1.05, 1), loc='upper left')
+    ax1.set_xticks([])  # Remove x-axis labels for readability
+
+    # Plot 2: Reads totals by organism (stacked bar)
+    ax2 = axes[0, 1]
+    reads_plot_data = reads_totals[organism_cols]
+    reads_plot_data.plot(kind='bar', stacked=True, ax=ax2, width=0.8, color=colors)
+    ax2.set_title('Reads Totals by Organism (Stacked)', fontweight='bold')
+    ax2.set_xlabel('Sample Index')
+    ax2.set_ylabel('Reads')
+    ax2.legend(title='Organism', bbox_to_anchor=(1.05, 1), loc='upper left')
+    ax2.set_xticks([])  # Remove x-axis labels for readability
+
+    # Plot 3: TPM percentage by organism
+    ax3 = axes[1, 0]
+    tpm_percentages = tpm_plot_data.div(tpm_plot_data.sum(axis=1), axis=0) * 100
+    tpm_percentages.plot(kind='bar', stacked=True, ax=ax3, width=0.8, color=colors)
+    ax3.set_title('TPM Distribution by Organism (%)', fontweight='bold')
+    ax3.set_xlabel('Sample Index')
+    ax3.set_ylabel('Percentage (%)')
+    ax3.legend(title='Organism', bbox_to_anchor=(1.05, 1), loc='upper left')
+    ax3.set_xticks([])  # Remove x-axis labels for readability
+
+    # Plot 4: Reads percentage by organism
+    ax4 = axes[1, 1]
+    reads_percentages = reads_plot_data.div(reads_plot_data.sum(axis=1), axis=0) * 100
+    reads_percentages.plot(kind='bar', stacked=True, ax=ax4, width=0.8, color=colors)
+    ax4.set_title('Reads Distribution by Organism (%)', fontweight='bold')
+    ax4.set_xlabel('Sample Index')
+    ax4.set_ylabel('Percentage (%)')
+    ax4.legend(title='Organism', bbox_to_anchor=(1.05, 1), loc='upper left')
+    ax4.set_xticks([])  # Remove x-axis labels for readability
+
+    # Adjust layout
+    plt.tight_layout()
+
+    # Save plot
+    plot_output = os.path.join(output_dir, "organism_totals_barplots.png")
+    plt.savefig(plot_output, dpi=300, bbox_inches='tight')
+    plt.savefig(plot_output.replace('.png', '.pdf'), bbox_inches='tight')
+
+    print(f"Saved plots: {plot_output}")
+    plt.close()
+
+    # Create summary statistics plot
+    create_summary_stats_plot(tpm_totals, reads_totals, output_dir, organisms)
+
+def create_summary_stats_plot(tpm_totals, reads_totals, output_dir, organisms):
+    """
+    Create summary statistics visualization.
+
+    Args:
+        - tpm_totals (pandas.DataFrame): DataFrame with TPM totals by organism
+        - reads_totals (pandas.DataFrame): DataFrame with reads totals by organism
+        - output_dir (str): Directory to save plots
+        - organisms (dict): Dictionary containing the names and desired colors used
+            for graph labeling
+
+    Returns:
+        - N/A
+    """
+    organism_cols = organisms["names"]
+    color_cols = organisms["colors"]
+
+    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
+    fig.suptitle('Summary Statistics by Organism', fontsize=16, fontweight='bold')
+
+    # TPM summary
+    ax1 = axes[0]
+    tpm_summary = tpm_totals[organism_cols].mean()
+    bars1 = ax1.bar(range(len(organism_cols)), tpm_summary.values,
+                   color=color_cols)
+    ax1.set_title('Mean TPM by Organism', fontweight='bold')
+    ax1.set_xlabel('Organism')
+    ax1.set_ylabel('Mean TPM')
+    ax1.set_xticks(range(len(organism_cols)))
+    ax1.set_xticklabels(organism_cols, rotation=45)
+
+    # Add value labels on bars
+    for i, (bar, val) in enumerate(zip(bars1, tpm_summary.values)):
+        ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + val*0.01,
+                f'{val:.0f}', ha='center', va='bottom')
+
+    # Reads summary
+    ax2 = axes[1]
+    reads_summary = reads_totals[organism_cols].mean()
+    bars2 = ax2.bar(range(len(organism_cols)), reads_summary.values,
+                   color=color_cols)
+    ax2.set_title('Mean Reads by Organism', fontweight='bold')
+    ax2.set_xlabel('Organism')
+    ax2.set_ylabel('Mean Reads')
+    ax2.set_xticks(range(len(organism_cols)))
+    ax2.set_xticklabels(organism_cols, rotation=45)
+
+    # Add value labels on bars
+    for i, (bar, val) in enumerate(zip(bars2, reads_summary.values)):
+        ax2.text(bar.get_x() + bar.get_width()/2, bar.get_height() + val*0.01,
+                f'{val:.0f}', ha='center', va='bottom')
+
+    plt.tight_layout()
+
+    # Save plot
+    summary_output = os.path.join(output_dir, "organism_summary_stats.png")
+    plt.savefig(summary_output, dpi=300, bbox_inches='tight')
+    plt.savefig(summary_output.replace('.png', '.pdf'), bbox_inches='tight')
+
+    print(f"Saved summary plot: {summary_output}")
+    plt.close()
+
+def main():
+    """
+    Genenerates analysis and visualization written to the
+    base_dir/organism_analysis/ directory.
+
+    Expected Input Directory Structure
+    ----------------------------------
+    The script expects the following structure:
+
+        base_dir/
+        └── combined_results/
+        │   ├── STAR_TPMcalculator_*_TPMs_Matrix_Merged.csv
+        │   ├── STAR_TPMcalculator_*_Reads_Matrix_Merged.csv
+        └── tpmcalculator_output/
+            └── ...
+
+    Organism Dictionary
+    -------------------
+    organisms contains two fields used throughout the analysis:
+
+        names
+            List of gene ID prefixes used to identify which organism each
+            gene belongs to. For example, a gene ID like "Sym_Cgor_gene123"
+            should be labeled as "Sym_Cgor_".
+
+        colors
+            List of colors corresponding to each organism prefix.
+
+    Required Paths
+    --------------------
+    req_paths contains user-defined paths and output file names:
+
+        combined_results_dir
+            Directory containing the 'combined_results/' folder from the
+            previous script in the pipeline.
+
+        tpm_file
+            File name for the merged TPM matrix.
+
+        reads_file
+            File name for the merged Reads matrix.
+    """
+
+    ####################
+    # Define Organisms
+    ####################
+    organisms = {
+        "names": ['Host_', 'Sym_Cgor_', 'Sym_Dtre_', 'Sym_Smic_'],
+        "colors": ['#1f77b4', '#ff7f0e', '#2ca02c' ,'#d62728'] 
+    }
+
+    ####################
+    # Define paths
+    ####################
+    req_paths = {
+        "combined_results_dir": "</path/to/combined_results/dir>",
+        "tpm_file": "STAR_TPMcalculator_TPMs_Matrix_Merged_rerun.csv",
+        "reads_file": "STAR_TPMcalculator_Reads_Matrix_Merged_rerun.csv"
+    }
+
+    tpm_file = os.path.join(req_paths["combined_results_dir"], req_paths["tpm_file"])
+    reads_file = os.path.join(req_paths["combined_results_dir"], req_paths["reads_file"])
+
+    if os.path.exists(reads_file):
+        print("Using original reads matrix (may have different gene set)")
+    else:
+        reads_file = tpm_file
+        print("No reads matrix found, using TPM matrix for both analyses")
+
+    # Check if input files exist
+    if not os.path.exists(tpm_file):
+        print(f"TPM matrix file not found: {tpm_file}")
+        print("Please run combine_tpmcalculator_results.py first")
+        return
+
+    # Create output directory for analysis results
+    output_dir = os.path.join(req_paths["combined_results_dir"], "organism_analysis")
+    os.makedirs(output_dir, exist_ok=True)
+    
+    # Analyze organism totals
+    tpm_totals, reads_totals = analyze_organism_totals(tpm_file, reads_file, output_dir, organisms)
+
+    print("\nAnalysis completed successfully!")
+    print(f"Results saved in: {output_dir}")
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/post_processing/TPMCalculator/scripts/check_bam_sort.py b/post_processing/TPMCalculator/scripts/check_bam_sort.py
new file mode 100644
index 0000000..fef7e42
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/check_bam_sort.py
@@ -0,0 +1,138 @@
+#!/usr/bin/env python3
+"""
+check_bam_sort.py
+
+The script scans each sample directory, identifies the STAR-aligned BAM file,
+determines its sort order using `samtools view -H`, and logs the results to
+a TSV file for quick verification of alignment output consistency.
+
+Functions:
+    - get_sort_status(bam_path)
+    - main()
+
+Usage:
+    python3 check_bam_sort.py
+"""
+
+import os
+import glob
+import subprocess
+import csv
+
+def get_sort_status(bam_path):
+    """
+    Returns samtools sort order: "coordinate", "queryname", "unsorted",
+    or "unknown".
+
+    Args:
+        - bam_path (string): Path to bam file.
+
+    Returns:
+        - (string): samtools sort order.
+    """
+    try:
+
+        # Run samtools command
+        header = subprocess.run(
+            ["samtools", "view", "-H", bam_path],
+            stdout=subprocess.PIPE,
+            stderr=subprocess.PIPE,
+            text=True
+        )
+
+        # Return sort order
+        if header.returncode != 0:
+            return "unknown"
+
+        for line in header.stdout.splitlines():
+            if line.startswith("@HD"):
+                if "SO:coordinate" in line:
+                    return "coordinate"
+                elif "SO:queryname" in line:
+                    return "queryname"
+                elif "SO:unsorted" in line:
+                    return "unsorted"
+                else:
+                    return "unsorted"
+
+        return "unsorted"
+    except Exception:
+        return "unknown"
+
+
+def main():
+    """
+    Finds bam files within a provided folder and logs the
+    samtools sort order.
+
+    Expected Input Directory Structure
+    ----------------------------------
+    The script expects the following directory structure:
+
+        base_dir/
+        ├── PerSample/
+        │   ├── SAMPLE_1/
+        │   │   └── results_STAR_htseq/
+        │   │       └── *_dedup_NoSingletonCollated.out.bam
+        │   │
+        │   ├── SAMPLE_2/
+        │   │   └── results_STAR_htseq/
+        │   │       └── *_dedup_NoSingletonCollated.out.bam
+        │   │
+        │   └── ...
+    """
+
+    #################
+    # Define paths
+    ##################
+    base_dir = "</path/to/base/dir>"
+    per_sample_dir = os.path.join(base_dir, "PerSample")
+
+    log_file_name = "bam_sort_status_" + os.path.basename(base_dir) + ".tsv"
+    log_path = os.path.join(os.getcwd(), log_file_name)
+
+    # Check if PerSample directory exists
+    if not os.path.exists(per_sample_dir):
+        print(f"Error: PerSample directory not found at {per_sample_dir}")
+        return
+
+    # Find all sample directories
+    sample_dirs = [d for d in os.listdir(per_sample_dir) 
+                   if os.path.isdir(os.path.join(per_sample_dir, d))]
+    print(f"\nFound {len(sample_dirs)} sample directories")
+
+    # Open TSV
+    with open(log_path, "w", newline="") as log_file:
+        writer = csv.writer(log_file, delimiter="\t")
+        writer.writerow(["sample_dir", "bam_file", "sort_status"])
+
+        # Loop over samples
+        for sample_dir in sample_dirs:
+            sample_path = os.path.join(per_sample_dir, sample_dir)
+            results_star_htseq_path = os.path.join(sample_path, "results_STAR_htseq")
+
+            if not os.path.exists(results_star_htseq_path):
+                print(f"Warning: results_STAR_htseq directory not found for {sample_dir}")
+                continue
+
+            # Find BAM file
+            bam_pattern = os.path.join(results_star_htseq_path, "*dedup_NoSingletonCollated.out.bam")
+            bam_files = glob.glob(bam_pattern)
+
+            if not bam_files:
+                print(f"Warning: No BAM file found for {sample_dir}")
+                continue
+            elif len(bam_files) > 1:
+                print(f"Warning: Multiple BAM files found for {sample_dir}, using the first one")
+
+            bam_file = bam_files[0]
+
+            # Check sort status and log result
+            sort_status = get_sort_status(bam_file)
+            writer.writerow([sample_dir, bam_file, sort_status])
+
+    print(f"\nLogged BAM sort status to {log_path}\n")
+
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/post_processing/TPMCalculator/scripts/combine_tpmcalculator_results.py b/post_processing/TPMCalculator/scripts/combine_tpmcalculator_results.py
new file mode 100644
index 0000000..ded8307
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/combine_tpmcalculator_results.py
@@ -0,0 +1,258 @@
+#!/usr/bin/env python3
+"""
+combine_tpmcalculator_results.py
+
+Combine TPMCalculator output files from multiple RNA-seq samples into merged matrices.
+This script scans the TPMCalculator output directories and aligns TPM and Reads
+values across all samples by gene. Two merged CSV are generated.
+
+Functions:
+    - extract_sample_name(filepath)
+    - find_tpmcalculator_outputs(base_dir)
+    - combine_tpm_matrices(output_files, output_dir, req_paths)
+    - main()
+
+Usage:
+    python3 combine_tpmcalculator_results.py
+"""
+
+import os
+import pandas as pd
+import glob
+import re
+from pathlib import Path
+
+def extract_sample_name(filepath):
+    """
+    Extract sample name from filepath by removing '_star_10_0.3_0.66_0_dedup' part.
+
+    Args:
+        - filepath (str): Path to the TPMCalculator output file
+
+    Returns:
+        - sample_name (str): Sample name
+    """
+    filename = os.path.basename(filepath)
+    # Remove the file extension and TPMCalculator suffix
+    base_name = filename.replace('_tpmcalculator.txt_genes.out', '')
+    # Remove the STAR alignment suffix
+    sample_name = re.sub(r'_star_10_0\.3_0\.66_0_dedup.*', '', base_name)
+    return sample_name
+
+def find_tpmcalculator_outputs(base_dir):
+    """
+    Find all TPMCalculator output files with various possible patterns.
+
+    Args:
+        - base_dir (str): Base directory containing PerSample folders
+
+    Returns:
+        - list: List of file paths
+    """
+    # per_sample_dir = os.path.join(base_dir, "PerSample")
+    per_sample_dir = os.path.join(base_dir, "tpmcalculator_output")
+
+    # Try multiple patterns for TPMCalculator output files
+    patterns = [
+        os.path.join(per_sample_dir, "*", "*Collated.out_genes"),
+        os.path.join(per_sample_dir, "*", "*_genes.out"),
+        os.path.join(per_sample_dir, "*", "*NoSingletonCollated.out_genes.out"),
+        os.path.join(per_sample_dir, "*", "*dedup*.out_genes.out"),
+    ]
+
+    print(patterns)
+
+    output_files = []
+    for pattern in patterns:
+        files = glob.glob(pattern)
+        if files:
+            output_files = files
+            print(f"Found files with pattern: {pattern}")
+            break
+
+    return sorted(output_files)
+
+def combine_tpm_matrices(output_files, output_dir, req_paths):
+    """
+    Combine TPMCalculator results into TPM and Reads matrices.
+
+    Args:
+        - output_files (list): List of TPMCalculator output file paths
+        - output_dir (str): Directory to save the combined matrices
+        - req_paths (dict): Dictionary containing names of TPM and Reads CSVs
+    """
+    tpm_data = {}
+    reads_data = {}
+    gene_ids = None
+
+    print(f"Processing {len(output_files)} TPMCalculator output files...")
+
+    for i, filepath in enumerate(output_files):
+        try:
+            # Extract sample name
+            sample_name = extract_sample_name(filepath)
+            print(f"Processing {i+1}/{len(output_files)}: {sample_name}")
+
+            # Check file size first
+            file_size = os.path.getsize(filepath)
+            if file_size < 300:  # Less than 300 bytes is likely empty (just headers)
+                print(f"Warning: File {filepath} is too small ({file_size} bytes), likely empty")
+                continue
+
+            # Read the TPMCalculator output file
+            df = pd.read_csv(filepath, sep='\t')
+
+            # Identify duplicated Gene_Id
+            dup_mask = df['Gene_Id'].duplicated(keep=False)
+            if dup_mask.any():
+                skipped_file = os.path.join(output_dir, "Skipped_Duplicate_Gene_Ids.csv")
+
+                # Save skipped rows
+                df[dup_mask].assign(sample=sample_name).to_csv(
+                    skipped_file,
+                    sep=",",
+                    index=False,
+                    mode="a",
+                    header=not os.path.exists(skipped_file)
+                )
+                df = df[~dup_mask]
+
+            # Check if dataframe is empty
+            if df.empty:
+                print(f"Warning: File {filepath} contains no data")
+                continue
+
+            # Check if required columns exist
+            required_cols = ['Gene_Id', 'TPM', 'Reads']
+            missing_cols = [col for col in required_cols if col not in df.columns]
+
+            if missing_cols:
+                print(f"Warning: Missing columns {missing_cols} in {filepath}")
+                print(f"Available columns: {list(df.columns)}")
+                continue
+
+            # Check if there are any genes
+            if len(df) == 0:
+                print(f"Warning: No genes found in {filepath}")
+                continue
+
+            # Set gene IDs from first file
+            if gene_ids is None:
+                gene_ids = df['Gene_Id'].tolist()
+                print(f"Using gene list from {sample_name} with {len(gene_ids)} genes")
+
+            # Check if gene lists match
+            if gene_ids != df['Gene_Id'].tolist():
+                print(f"Warning: Gene list mismatch in {sample_name}")
+                # Try to align with existing gene_ids
+                df_aligned = df.set_index('Gene_Id').reindex(gene_ids)
+                tpm_data[sample_name] = df_aligned['TPM'].fillna(0).tolist()
+                reads_data[sample_name] = df_aligned['Reads'].fillna(0).tolist()
+            else:
+                # Extract TPM and Reads columns
+                tpm_data[sample_name] = df['TPM'].tolist()
+                reads_data[sample_name] = df['Reads'].tolist()
+
+        except Exception as e:
+            print(f"Error processing {filepath}: {e}")
+            continue
+
+    if not tpm_data:
+        print("No valid data found. Exiting.")
+        return
+
+    print(f"Successfully processed {len(tpm_data)} samples")
+    print(f"Found {len(gene_ids)} genes")
+
+    # Create TPM matrix
+    tpm_df = pd.DataFrame(tpm_data)
+    tpm_df.insert(0, 'Gene_Id', gene_ids)
+
+    # Create Reads matrix
+    reads_df = pd.DataFrame(reads_data)
+    reads_df.insert(0, 'Gene_Id', gene_ids)
+
+    # Create output directory if it doesn't exist
+    os.makedirs(output_dir, exist_ok=True)
+
+    # Save matrices
+    tpm_output = os.path.join(output_dir, req_paths["tpm_csv"])
+    reads_output = os.path.join(output_dir, req_paths["reads_csv"])
+
+    tpm_df.to_csv(tpm_output, index=False)
+    reads_df.to_csv(reads_output, index=False)
+
+    print(f"\nMatrices saved:")
+    print(f"TPM matrix: {tpm_output}")
+    print(f"Reads matrix: {reads_output}")
+    print(f"Matrix dimensions: {tpm_df.shape}")
+    print(f"Samples: {list(tpm_data.keys())}")
+
+def main():
+    """
+    This function combines the TPMCalculator results across all samples,
+    by gene, into merged TPM and Reads matrices. 
+
+    Expected Input Directory Structure
+    ----------------------------------
+    The script expects the following structure:
+
+        base_dir/
+        └── tpmcalculator_output/
+            ├── SAMPLE_1/
+            │   └── SAMPLE_1...out_genes.out
+            ├── SAMPLE_2/
+            │   └── SAMPLE_2...out_genes.out
+            └── ...
+
+    Required Paths
+    --------------------
+    req_paths contains user-defined paths and output file names:
+
+        base_dir
+            Root directory containing 'tpmcalculator_output/'. This is the output directory
+            from the previous script.
+
+        tpm_csv
+            File name for the merged TPM matrix.
+
+        reads_csv
+            File name for the merged Reads matrix.
+    """
+
+    #################
+    # Define paths
+    ##################
+    req_paths = {
+        "base_dir": "</path/to/base/dir>",
+        "tpm_csv": "STAR_TPMcalculator_TPMs_Matrix_Merged_rerun.csv",
+        "reads_csv": "STAR_TPMcalculator_Reads_Matrix_Merged_rerun.csv"
+    }
+    output_dir = req_paths["base_dir"] + "/combined_results"
+
+    # Find TPMCalculator output files
+    print("Searching for TPMCalculator output files...")
+    output_files = find_tpmcalculator_outputs(req_paths["base_dir"])
+
+    if not output_files:
+        print("No TPMCalculator output files found!")
+        print("Expected pattern: tpmcalculator/*/*Collated.out_sorted_genes")
+        return
+
+    print(f"Found {len(output_files)} TPMCalculator output files")
+
+    # Show first few files for verification
+    print("\nFirst few files found:")
+    for f in output_files[:5]:
+        sample_name = extract_sample_name(f)
+        print(f"  {sample_name}: {f}")
+
+    if len(output_files) > 5:
+        print(f"  ... and {len(output_files) - 5} more")
+
+    # Combine matrices
+    combine_tpm_matrices(output_files, output_dir, req_paths)
+    
+
+if __name__ == "__main__":
+    main()
diff --git a/post_processing/TPMCalculator/scripts/generate_tpmcalculator_jobs.py b/post_processing/TPMCalculator/scripts/generate_tpmcalculator_jobs.py
new file mode 100644
index 0000000..96f39e7
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/generate_tpmcalculator_jobs.py
@@ -0,0 +1,202 @@
+#!/usr/bin/env python3
+"""
+generate_tpmcalculator_jobs.py
+
+This module generates and submits SLURM job scripts for running TPMCalculator on
+multiple RNA-seq samples. It scans a PerSample directory containing sample-specific
+subdirectories, identifies the appropriate BAM file for each sample,
+and generates a SLURM job script that runs TPMCalculator using a
+provided GTF annotation file.
+
+Functions:
+    - create_slurm_job_script(sample_id, bam_file, gtf_file, tmp_output_dir)
+    - main()
+
+Usage:
+    python3 generate_tpmcalculator_slurm_jobs.py
+"""
+
+import os
+import glob
+import subprocess
+
+def create_slurm_job_script(sample_id, bam_file, gtf_file, tmp_output_dir):
+    """
+    Create a SLURM job script for TPMCalculator analysis.
+    
+    Args:
+        - sample_id (string): Sample identifier
+        - bam_file (string): Path to the BAM file
+        - gtf_file (string): Path to the GTF file
+        - tmp_output_dir (string): Directory to save TPMCalculator output
+
+    Returns:
+        - job_script_path (string): Path to SLURM job bash script
+    """
+
+    job_script_content = f"""#!/bin/bash
+#SBATCH --job-name=tpmcalc_{sample_id}
+#SBATCH --cpus-per-task=4
+#SBATCH --mem=16G
+#SBATCH --time=08:00:00
+#SBATCH --output={tmp_output_dir}/{sample_id}_tpmcalculator_%j.out
+#SBATCH --error={tmp_output_dir}/{sample_id}_tpmcalculator_%j.err
+
+# TPMCalculator is available via conda installation
+
+# Define variables
+BAM="{bam_file}"
+GTF="{gtf_file}"
+OUTPUT_DIR="{tmp_output_dir}"
+
+# Run TPMCalculator
+echo "Starting TPMCalculator analysis for {sample_id}..."
+mkdir -p $OUTPUT_DIR
+cd $OUTPUT_DIR
+TPMCalculator -g $GTF -b $BAM -o $OUTPUT_DIR/{sample_id}_tpmcalculator.txt
+
+echo "TPMCalculator analysis completed for {sample_id}"
+"""
+    
+    # Create the bash file containing the SLURM job
+    job_script_path = os.path.join(tmp_output_dir, f"{sample_id}_tpmcalculator_job.sh")
+    
+    with open(job_script_path, 'w') as f:
+        f.write(job_script_content)
+    
+    # Make the script executable
+    os.chmod(job_script_path, 0o755)
+    
+    return job_script_path
+
+def main():
+    """
+    This function scans a project results directory, identifies sample
+    BAM files produced by the STAR pipeline, generates a SLURM job script
+    for each sample, and submits those jobs to the cluster.
+
+    Expected Input Directory Structure
+    ----------------------------------
+    The script expects the following directory structure:
+
+        base_dir/
+        ├── PerSample/
+        │   ├── SAMPLE_1/
+        │   │   └── results_STAR_htseq/
+        │   │       └── *_dedup_NoSingletonCollated.out.bam
+        │   │
+        │   ├── SAMPLE_2/
+        │   │   └── results_STAR_htseq/
+        │   │       └── *_dedup_NoSingletonCollated.out.bam
+        │   │
+        │   └── ...
+
+    Where:
+        - Each directory inside `PerSample/` represents one RNA-seq sample.
+        - Each sample directory must contain a `results_STAR_htseq/` folder.
+        - The BAM file used for TPMCalculator must match the pattern:
+          `*dedup_NoSingletonCollated.out.bam`.
+
+    Required Paths
+    --------------
+    req_paths is a dictionary containing required input and output paths:
+
+        base_dir
+            Root directory of the RNA-seq project results. This directory
+            must contain the `PerSample/` directory with all sample folders.
+
+        gtf_file
+            Path to the reference GTF annotation file used by TPMCalculator
+            to assign reads to genes and transcripts.
+
+        output_dir
+            Directory where TPMCalculator outputs and generated SLURM job
+            scripts will be written. A subdirectory for each sample will
+            be created automatically if it does not already exist.
+    """
+
+    #################
+    # Define paths
+    ##################
+    req_paths ={
+        "base_dir": "</path/to/base/dir>",
+        "gtf_file": "</path/to/gtf/file>",
+        "output_dir": "</path/to/output/dir>"
+    }
+    per_sample_dir = os.path.join(req_paths["base_dir"], "PerSample")
+    
+    # Check if GTF file exists
+    if not os.path.exists(req_paths["gtf_file"]):
+        print(f"Error: GTF file not found at {req_paths["gtf_file"]}")
+        return
+    
+    # Check if PerSample directory exists
+    if not os.path.exists(per_sample_dir):
+        print(f"Error: PerSample directory not found at {per_sample_dir}")
+        return
+    
+    # Find all sample directories
+    sample_dirs = [d for d in os.listdir(per_sample_dir) 
+                   if os.path.isdir(os.path.join(per_sample_dir, d))]
+
+    print(f"Found {len(sample_dirs)} sample directories")
+
+    job_scripts_created = []
+    
+    for sample_dir in sample_dirs:
+        sample_path = os.path.join(per_sample_dir, sample_dir)
+        results_star_htseq_path = os.path.join(sample_path, "results_STAR_htseq")
+        
+        # Check if results_STAR_htseq directory exists
+        if not os.path.exists(results_star_htseq_path):
+            print(f"Warning: results_STAR_htseq directory not found for {sample_dir}")
+            continue
+        
+        # Find BAM file
+        bam_pattern = os.path.join(results_star_htseq_path, "*dedup_NoSingletonCollated.out.bam")
+        bam_files = glob.glob(bam_pattern)
+
+        if not bam_files:
+            print(f"Warning: No BAM file found for {sample_dir}")
+            continue
+        elif len(bam_files) > 1:
+            print(f"Warning: Multiple BAM files found for {sample_dir}, using the first one")
+        
+        bam_file = bam_files[0]
+        
+        # Create tpmcalculator directory
+        tpmcalculator_dir = req_paths["output_dir"] + sample_dir
+        os.makedirs(tpmcalculator_dir, exist_ok=True)
+        
+        # Extract sample ID from directory name
+        sample_id = sample_dir
+        
+        # Create SLURM job script
+        job_script_path = create_slurm_job_script(sample_id, bam_file, req_paths["gtf_file"], tpmcalculator_dir)
+        job_scripts_created.append(job_script_path)
+        
+        print(f"Created job script for {sample_id}: {job_script_path}")
+    
+    print(f"\nTotal job scripts created: {len(job_scripts_created)}")
+    
+    # Submit all jobs automatically
+    print("\nSubmitting jobs to SLURM cluster...")
+    submitted_jobs = []
+
+    for job_script in job_scripts_created:
+        try:
+            result = subprocess.run(['sbatch', job_script], stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)
+            if result.returncode == 0:
+                job_id = result.stdout.strip().split()[-1]
+                submitted_jobs.append(job_id)
+                print(f"Submitted job {job_id} for {job_script}")
+            else:
+                print(f"Failed to submit {job_script}: {result.stderr.strip()}")
+        except Exception as e:
+            print(f"Error submitting {job_script}: {e}")
+
+    print(f"\nSuccessfully submitted {len(submitted_jobs)} jobs out of {len(job_scripts_created)} total jobs.")
+
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/post_processing/TPMCalculator/scripts/gff3_to_gtf.py b/post_processing/TPMCalculator/scripts/gff3_to_gtf.py
new file mode 100644
index 0000000..9ac6a09
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/gff3_to_gtf.py
@@ -0,0 +1,162 @@
+#!/usr/bin/env python3
+"""
+gff3_to_gtf.py
+
+This script parses a GFF3 file and converts supported feature types
+(gene, mRNA/transcript, exon, CDS) into GTF format while maintaining
+the correct gene_id and transcript_id relationships.
+
+Functions:
+    - parse_attributes(attr_string)
+    - format_gtf_attributes(attributes, feature_type, gene_id=None, transcript_id=None)
+    - convert_gff3_to_gtf(input_file, output_file)
+    - main()
+
+Usage:
+    python3 gff3_to_gtf.py <input_gff3> <output_gtf>
+"""
+
+import sys
+import re
+from pathlib import Path
+
+def parse_attributes(attr_string):
+    """
+    GFF3 attributes are formatted as key-value pairs separated by
+    semicolons. This function converts the attribute string into a
+    Python dictionary for easier access.
+
+    Args:
+        - attr_string (str): Attribute column from a GFF3 file.
+
+    Returns:
+        - attributes (dict): Dictionary mapping attribute keys to values.
+    """
+    attributes = {}
+    if not attr_string or attr_string == '.':
+        return attributes
+    
+    for item in attr_string.split(';'):
+        if '=' in item:
+            key, value = item.split('=', 1)
+            attributes[key] = value
+    return attributes
+
+def format_gtf_attributes(attributes, feature_type, gene_id=None, transcript_id=None):
+    """
+    Constructs the GTF attribute string using gene_id and transcript_id
+    along with optional gene_name or transcript_name fields if present.
+
+    Args:
+        - attributes (dict): Parsed GFF3 attributes.
+        - feature_type (str): Feature type (gene, transcript, exon, CDS).
+        - gene_id (str, optional): Gene identifier.
+        - transcript_id (str, optional): Transcript identifier.
+
+    Returns:
+        - str: Properly formatted GTF attribute string.
+    """
+    gtf_attrs = []
+    
+    if gene_id:
+        gtf_attrs.append(f'gene_id "{gene_id}"')
+    
+    if transcript_id:
+        gtf_attrs.append(f'transcript_id "{transcript_id}"')
+    
+    if 'Name' in attributes:
+        if feature_type == 'gene':
+            gtf_attrs.append(f'gene_name "{attributes["Name"]}"')
+        elif feature_type in ['transcript', 'mRNA']:
+            gtf_attrs.append(f'transcript_name "{attributes["Name"]}"')
+    
+    return '; '.join(gtf_attrs) + ';'
+
+def convert_gff3_to_gtf(input_file, output_file):
+    """
+    Reads a GFF3 file line by line, parses feature attributes, and writes
+    supported features (gene, transcript/mRNA, exon, CDS) to a GTF file.
+    Transcript-to-gene relationships are tracked to correctly assign
+    gene_id and transcript_id values.
+
+    Args:
+        - input_file (str): Path to the input GFF3 file.
+        - output_file (str): Path to the output GTF file.
+
+    Returns:
+        - N/A
+    """
+    transcript_to_gene = {}
+    
+    with open(input_file, 'r') as infile, open(output_file, 'w') as outfile:
+        for line_num, line in enumerate(infile, 1):
+            if line_num % 100000 == 0:
+                print(f"Processed {line_num:,} lines...")
+            
+            line = line.strip()
+            
+            if line.startswith('#') or not line:
+                continue
+            
+            fields = line.split('\t')
+            if len(fields) != 9:
+                continue
+            
+            seqname, source, feature, start, end, score, strand, frame, attributes = fields
+            attr_dict = parse_attributes(attributes)
+            
+            if feature == 'gene':
+                gene_id = attr_dict.get('ID', '')
+                
+                gtf_attributes = format_gtf_attributes(attr_dict, feature, gene_id=gene_id)
+                gtf_line = f"{seqname}\t{source}\tgene\t{start}\t{end}\t{score}\t{strand}\t{frame}\t{gtf_attributes}"
+                outfile.write(gtf_line + '\n')
+            
+            elif feature in ['mRNA', 'transcript']:
+                transcript_id = attr_dict.get('ID', '')
+                parent_gene = attr_dict.get('Parent', '')
+                
+                transcript_to_gene[transcript_id] = parent_gene
+                
+                gtf_attributes = format_gtf_attributes(attr_dict, feature, gene_id=parent_gene, transcript_id=transcript_id)
+                gtf_line = f"{seqname}\t{source}\ttranscript\t{start}\t{end}\t{score}\t{strand}\t{frame}\t{gtf_attributes}"
+                outfile.write(gtf_line + '\n')
+            
+            elif feature in ['exon', 'CDS']:
+                parent_id = attr_dict.get('Parent', '')
+                
+                # Check if parent is a transcript/mRNA or directly a gene
+                if parent_id in transcript_to_gene:
+                    # Parent is a transcript
+                    parent_gene = transcript_to_gene[parent_id]
+                    transcript_id = parent_id
+                else:
+                    # Parent might be a gene (some exons directly reference genes)
+                    parent_gene = parent_id
+                    transcript_id = parent_id  # Use gene ID as transcript ID for these cases
+                
+                gtf_attributes = format_gtf_attributes(attr_dict, feature, gene_id=parent_gene, transcript_id=transcript_id)
+                gtf_line = f"{seqname}\t{source}\t{feature}\t{start}\t{end}\t{score}\t{strand}\t{frame}\t{gtf_attributes}"
+                outfile.write(gtf_line + '\n')
+
+def main():
+    if len(sys.argv) != 3:
+        print("Usage: python3 gff3_to_gtf.py <input_gff3> <output_gtf>")
+        sys.exit(1)
+    
+    input_file = sys.argv[1]
+    output_file = sys.argv[2]
+    
+    if not Path(input_file).exists():
+        print(f"Error: Input file '{input_file}' not found.")
+        sys.exit(1)
+    
+    try:
+        convert_gff3_to_gtf(input_file, output_file)
+        print(f"Successfully converted {input_file} to {output_file}")
+    except Exception as e:
+        print(f"Error during conversion: {e}")
+        sys.exit(1)
+
+if __name__ == "__main__":
+    main()
\ No newline at end of file
diff --git a/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/README.md b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/README.md
new file mode 100644
index 0000000..f0323e5
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/README.md
@@ -0,0 +1,11 @@
+# GFF3 to GTF Fixes
+
+This directory contains small helper scripts used to make manual fixes to GTF files after converting annotations from **GFF3 → GTF**.
+
+While the main conversion script performs the bulk of the format conversion, some reference annotations required small adjustments to ensure compatibility with downstream tools such as **TPMCalculator**. These fixes were typically related to feature naming or attribute formatting.
+
+Examples of fixes performed:
+- Converting CDS to exon label
+- Fixing GTF chromosome names to match BAM headers
+
+Each script documents the specific transformation it performs. These scripts were run **after the initial GFF3 → GTF conversion step** and before running downstream expression quantification.
\ No newline at end of file
diff --git a/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/cds_to_exon_gtf.sh b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/cds_to_exon_gtf.sh
new file mode 100755
index 0000000..a165fe3
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/cds_to_exon_gtf.sh
@@ -0,0 +1,29 @@
+#!/bin/bash
+# Convert CDS to exon for host so TPMCalculator can count them
+
+GTF="Valid_Plob_Cgor_concat.gtf"
+FIXED_GTF="Valid_Plob_Cgor_concat_fixed.gtf"
+
+awk 'BEGIN{OFS="\t"}
+{
+    # Split line into first 8 fields + rest as attributes
+    attrs = ""
+    if(NF>8) {
+        attrs = $9
+        for(i=10;i<=NF;i++) attrs = attrs " " $i
+    }
+
+    if($3=="CDS") {
+        $3="exon"
+        # Extract transcript_id
+        match(attrs,/transcript_id "[^"]+"/)
+        tid=substr(attrs,RSTART+15,RLENGTH-16)  # get only the value inside quotes
+        # Prepend gene_id
+        attrs="gene_id \"" tid "\"; " attrs
+    }
+
+    # Rebuild the line: columns 1-8 + fixed attrs
+    print $1,$2,$3,$4,$5,$6,$7,$8,attrs
+}' "$GTF" > "$FIXED_GTF"
+
+echo "Fixed GTF created: $FIXED_GTF"
\ No newline at end of file
diff --git a/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/fix_gtf_scaffold_names.sh b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/fix_gtf_scaffold_names.sh
new file mode 100755
index 0000000..9e86284
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/fix_gtf_scaffold_names.sh
@@ -0,0 +1,19 @@
+#!/bin/bash
+# Fix GTF chromosome names to match BAM headers for TPMCalculator
+# Original issue: BAM and GTF scaffold names did not match, causing TPMCalculator to run without producing gene output.
+
+INPUT_GTF="Valid_Phar_Cgor_Dtre_Smic_concat.gtf"
+OUTPUT_GTF="Valid_Phar_Cgor_Dtre_Smic_concat_fixed.gtf"
+
+awk 'BEGIN{FS=OFS="\t"} 
+{
+  match($1, /JBDLLT010000[0-9]+/)
+  if (RSTART) {
+    id=substr($1, RSTART, RLENGTH)
+    sub(/JBDLLT010000/, "", id)
+    $1="Host_Phar_gnl|WGS:JBDLLT|Phar_scaff_" id "|gb|JBDLLT010000" id
+  }
+  print
+}' "$INPUT_GTF" > "$OUTPUT_GTF"
+
+echo "Fixed GTF written to $OUTPUT_GTF"
\ No newline at end of file
diff --git a/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/underscore_to_pipe.sh b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/underscore_to_pipe.sh
new file mode 100755
index 0000000..3ee6a94
--- /dev/null
+++ b/post_processing/TPMCalculator/scripts/gff3_to_gtf_fixes/underscore_to_pipe.sh
@@ -0,0 +1,26 @@
+#!/bin/bash
+# Convert underscore naming convention to piped for Host
+
+GTF="Valid_Pdae_Cgor_Dtre_Smic_concat.gtf"
+FIXED_GTF="Valid_Pdae_Cgor_Dtre_Smic_concat_fixed.gtf"
+
+# Replace '_size' to '|size'
+awk -F'\t' -v OFS='\t' '
+    /^#/ {print; next} 
+    {
+        # fix first column
+        sub(/_size/, "|size", $1)
+
+        # combine everything after column 8 into column 9
+        if (NF > 9) {
+            $9 = "";
+            for(i=9; i<=NF; i++) {
+                $9 = $9 $i
+                if(i<NF) $9 = $9 ";"
+            }
+        }
+
+        print
+    }' "$GTF" > "$FIXED_GTF"
+
+echo "Fixed GTF written to $FIXED_GTF"
\ No newline at end of file