RNA Secondary Structure

Prediction of RNA secondary structure using the Nussinov dynamic programming algorithm — implemented in C++.

A from-scratch implementation of the Nussinov algorithm for predicting RNA secondary structure. Given an RNA sequence, it finds the folding with the maximum number of Watson-Crick and wobble base pairs using dynamic programming, then recovers the structure via recursive traceback. Built during the CS361-UCSP Artificial Intelligence course.

Quickstart

git clone https://github.com/RayverAimar/secondary-structure-RNA.git
cd secondary-structure-RNA

g++ -O2 -o rna main.cpp
./rna

Output

Running on the default sequence ACUCGAUUCCGAG:

Sequence  : ACUCGAUUCCGAG
Structure : .(((((.).))))
Base pairs: 5

DP matrix (cell value = negative base pairs in that subrange):
       A   C   U   C   G   A   U   U   C   C   G   A   G
   A   0   0  -1  -1  -2  -2  -3  -3  -3  -3  -4  -5  -5
   C   0   0   0   0  -1  -2  -2  -2  -2  -2  -3  -4  -5
   U   0   0   0   0  -1  -2  -2  -2  -2  -2  -3  -4  -5
   C   0   0   0   0  -1  -1  -2  -2  -2  -2  -3  -4  -4
   G   0   0   0   0   0   0  -1  -2  -2  -2  -3  -3  -4
   A   0   0   0   0   0   0  -1  -1  -1  -1  -2  -3  -3
   U   0   0   0   0   0   0   0   0   0   0  -1  -2  -3
   U   0   0   0   0   0   0   0   0   0   0  -1  -2  -2
   C   0   0   0   0   0   0   0   0   0   0  -1  -1  -2
   C   0   0   0   0   0   0   0   0   0   0  -1  -1  -1
   G   0   0   0   0   0   0   0   0   0   0   0   0   0
   A   0   0   0   0   0   0   0   0   0   0   0   0   0
   G   0   0   0   0   0   0   0   0   0   0   0   0   0

The structure in dot-bracket notation — . means unpaired, ( and ) mark the two ends of a base pair:

A C U C G A U U C C G  A  G
. ( ( ( ( ( . ) . ) )  )  )
0 1 2 3 4 5 6 7 8 9 10 11 12

Recovered pairs: C(1)–G(12), U(2)–A(11), C(3)–G(10), G(4)–C(9), A(5)–U(7) — all Watson-Crick valid.

Theory

RNA base pairing rules

Pair	Type
A – U	Watson-Crick
U – A	Watson-Crick
C – G	Watson-Crick
G – C	Watson-Crick
G – U	Wobble
U – G	Wobble

DP formulation (Nussinov)

dp[i][j] stores the minimum cost for the subsequence i..j, where cost is defined as the negative number of base pairs (minimizing cost = maximizing pairs).

Base case:

$$dp[i][i] = 0 \qquad dp[i][j] = 0 \quad \text{if } j \leq i$$

Recurrence:

$$dp[i][j] = \min \begin{cases} dp[i+1][j-1] - 1 & \text{if bases } i \text{ and } j \text{ can pair} \ \displaystyle\min_{i \leq k < j}; dp[i][k] + dp[k+1][j] & \text{bifurcation (always evaluated)} \end{cases}$$

The bifurcation case is always evaluated — even when $i$ and $j$ can pair — because splitting the interval at some $k$ may yield more total pairs than a single $i$–$j$ pairing.

The answer is $-,dp[0][n-1]$.

Complexity: $O(n^3)$ time, $O(n^2)$ space.

Traceback

The optimal structure is recovered recursively from dp[0][n-1]:

If canPair(i, j) and dp[i][j] == dp[i+1][j-1] - 1 → pair $(i, j)$, recurse on $(i{+}1,; j{-}1)$
Else if dp[i][j] == dp[i+1][j] → $i$ is unpaired, recurse on $(i{+}1,; j)$
Else if dp[i][j] == dp[i][j-1] → $j$ is unpaired, recurse on $(i,; j{-}1)$
Else find $k$ such that dp[i][j] == dp[i][k] + dp[k+1][j] → recurse on both halves

The result is written in dot-bracket notation: . for unpaired, ( and ) for paired positions.

Bugs fixed

The original implementation had three correctness issues:

#	Bug	Fix
1	DP `if-else`: bifurcation was only considered when $i$–$j$ could NOT pair, missing cases where a split yields more pairs than a single pairing	Removed `else` — bifurcation loop always runs after the pairing check
2	Traceback false positives: the pair check `dp[i][j] == dp[i+1][j-1] - 1` passed for non-pairable bases due to coincidental DP values	Added `canPair(seq[i], seq[j])` guard to the traceback condition
3	Traceback incompleteness: the `while (i < mid && j >= mid)` loop stopped at the midpoint, silently missing all nested pairs contained entirely within one half	Replaced iterative loop with a fully recursive traceback

Project structure

secondary-structure-RNA/
├── main.cpp    # Nussinov DP + recursive traceback
├── LICENSE
└── README.md

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA Secondary Structure

Quickstart

Output

Theory

RNA base pairing rules

DP formulation (Nussinov)

Traceback

Bugs fixed

Project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RNA Secondary Structure

Quickstart

Output

Theory

RNA base pairing rules

DP formulation (Nussinov)

Traceback

Bugs fixed

Project structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages