Prediction of RNA secondary structure using the Nussinov dynamic programming algorithm — implemented in C++.
A from-scratch implementation of the Nussinov algorithm for predicting RNA secondary structure. Given an RNA sequence, it finds the folding with the maximum number of Watson-Crick and wobble base pairs using dynamic programming, then recovers the structure via recursive traceback. Built during the CS361-UCSP Artificial Intelligence course.
git clone https://github.com/RayverAimar/secondary-structure-RNA.git
cd secondary-structure-RNA
g++ -O2 -o rna main.cpp
./rnaRunning on the default sequence ACUCGAUUCCGAG:
Sequence : ACUCGAUUCCGAG
Structure : .(((((.).))))
Base pairs: 5
DP matrix (cell value = negative base pairs in that subrange):
A C U C G A U U C C G A G
A 0 0 -1 -1 -2 -2 -3 -3 -3 -3 -4 -5 -5
C 0 0 0 0 -1 -2 -2 -2 -2 -2 -3 -4 -5
U 0 0 0 0 -1 -2 -2 -2 -2 -2 -3 -4 -5
C 0 0 0 0 -1 -1 -2 -2 -2 -2 -3 -4 -4
G 0 0 0 0 0 0 -1 -2 -2 -2 -3 -3 -4
A 0 0 0 0 0 0 -1 -1 -1 -1 -2 -3 -3
U 0 0 0 0 0 0 0 0 0 0 -1 -2 -3
U 0 0 0 0 0 0 0 0 0 0 -1 -2 -2
C 0 0 0 0 0 0 0 0 0 0 -1 -1 -2
C 0 0 0 0 0 0 0 0 0 0 -1 -1 -1
G 0 0 0 0 0 0 0 0 0 0 0 0 0
A 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0 0
The structure in dot-bracket notation — . means unpaired, ( and ) mark the two ends of a base pair:
A C U C G A U U C C G A G
. ( ( ( ( ( . ) . ) ) ) )
0 1 2 3 4 5 6 7 8 9 10 11 12
Recovered pairs: C(1)–G(12), U(2)–A(11), C(3)–G(10), G(4)–C(9), A(5)–U(7) — all Watson-Crick valid.
| Pair | Type |
|---|---|
| A – U | Watson-Crick |
| U – A | Watson-Crick |
| C – G | Watson-Crick |
| G – C | Watson-Crick |
| G – U | Wobble |
| U – G | Wobble |
dp[i][j] stores the minimum cost for the subsequence i..j, where cost is defined as the negative number of base pairs (minimizing cost = maximizing pairs).
Base case:
Recurrence:
The bifurcation case is always evaluated — even when
The answer is
Complexity:
The optimal structure is recovered recursively from dp[0][n-1]:
- If
canPair(i, j)anddp[i][j] == dp[i+1][j-1] - 1→ pair$(i, j)$ , recurse on$(i{+}1,; j{-}1)$ - Else if
dp[i][j] == dp[i+1][j]→$i$ is unpaired, recurse on$(i{+}1,; j)$ - Else if
dp[i][j] == dp[i][j-1]→$j$ is unpaired, recurse on$(i,; j{-}1)$ - Else find
$k$ such thatdp[i][j] == dp[i][k] + dp[k+1][j]→ recurse on both halves
The result is written in dot-bracket notation: . for unpaired, ( and ) for paired positions.
The original implementation had three correctness issues:
| # | Bug | Fix |
|---|---|---|
| 1 |
DP if-else: bifurcation was only considered when |
Removed else — bifurcation loop always runs after the pairing check |
| 2 |
Traceback false positives: the pair check dp[i][j] == dp[i+1][j-1] - 1 passed for non-pairable bases due to coincidental DP values |
Added canPair(seq[i], seq[j]) guard to the traceback condition |
| 3 |
Traceback incompleteness: the while (i < mid && j >= mid) loop stopped at the midpoint, silently missing all nested pairs contained entirely within one half |
Replaced iterative loop with a fully recursive traceback |
secondary-structure-RNA/
├── main.cpp # Nussinov DP + recursive traceback
├── LICENSE
└── README.md