feat: add support for per-level alphabet in NFT by tmokenc · Pull Request #618 · VeriFIT/mata

tmokenc · 2026-04-15T23:42:02Z

This PR adds support for per-level alphabets in mata::nft::Nft.

The main addition is:

std::vector<Alphabet*> level_alphabets

with the following semantics:

in case the vector is empty, Nft can fallback to use the inherited nfa::Nfa::alphabet
if different levels use different alphabets, Nft::alphabet is set to nullptr, and the per-level alphabets are stored in level_alphabets
When all levels share the same alphabet, Nft can still expose a shared alphabet through the inherited Nfa::Nfa field. (not sure if this is necessary)

The PR also adds two helper functions:

alphabet_of_level returns the alphabet for a given level and falls back to the inherited Nfa alphabet when no level-specific alphabet is available
set_level_alphabets to assign or update alphabets

This PR does not yet introduce any changes to NFT operations or to the .mata format for NFTs. The .mata format part likely deserves a separate discussion.

Adda0 · 2026-04-16T09:46:00Z

I think it would be cleaner if we had a single Alphabet* alphabet argument, as until now, where one Alphabet class implementation could be called LevelAlphabet, and this alphabet would then contain the vector of alphabets.

The abstract alphabet would probably have to allow for translating optionally on a specific level, so Alphabet::translate_symb(Symbol symbol, std::optional<Level> level = std::nullopt), and all alphabets except the level one would ignore the level (and translate the symbol as normal).

Most, if not all, of the additional functions here would then be unnecessary.

tmokenc · 2026-04-16T13:38:54Z

I agree that it would be cleaner, reduce a lot of the confusion and potentially be better for the serialisation/deserialisation of the .mata format later on. I will work on it as soon as possible.

tmokenc · 2026-04-17T09:28:13Z

@Adda0 Please take a look. I refactored the code to use LevelAlphabet and moved the per-level alphabet logic into it. I also added level-aware functions to Alphabet. For normal alphabets, these simply ignore the level and use their non-level counterparts.
This makes it possible for NFTs to use a regular alphabet as though all levels shared the same one. When different levels need different alphabets, they can use LevelAlphabet.

This is my first C++ project, so I may have missed some conventions or done something in a way that is uncommon in C++.

codecov · 2026-04-20T16:29:33Z

Codecov Report

❌ Patch coverage is 35.39823% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.65%. Comparing base (010ee52) to head (854aeb5).
⚠️ Report is 22 commits behind head on devel.

Files with missing lines	Patch %	Lines
src/nft/builder.cc	0.00%	41 Missing ⚠️
src/nft/nft.cc	20.83%	17 Missing and 2 partials ⚠️
src/alphabet.cc	58.06%	10 Missing and 3 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##            devel     #618      +/-   ##
==========================================
- Coverage   72.91%   72.65%   -0.26%     
==========================================
  Files          45       45              
  Lines        6796     7289     +493     
  Branches     1538     1646     +108     
==========================================
+ Hits         4955     5296     +341     
- Misses       1227     1347     +120     
- Partials      614      646      +32

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adda0 · 2026-04-20T16:25:55Z

+    virtual std::string reverse_translate_symbol(Symbol symbol, size_t level) const {
+        (void)level;
+        return reverse_translate_symbol(symbol);
+    }


Side note: We definitely need to rename these monstrosities when we are refactoring alphabets. It is awful to work with this in a function call, e.g., something like minimize(nfa, nfa->alphabet.reverse_translate_symbol(symbol), true)...

Adda0 · 2026-04-20T16:43:29Z

It looks like you can ignore the failing CI actions. Not related.

The approach and the interface look good to me overall. It is a schame we cannot hide the invalid operations (with/without levels) for specific alphabet types without something like std::variant etc. Maybe even exploring something like std::variant might be worth it... I have not thought about it yet, though.

tmokenc · 2026-04-23T15:45:47Z

One idea crossed my mind about handling the levels: what if we introduce another abstract class, LevelAlphabet, deriving from Alphabet, and provide default implementations such that anything implementing Alphabet is also automatically a valid LevelAlphabet?

That way, NFA would keep using Alphabet as usual, while NFT would use LevelAlphabet, which could either be a specialized implementation, or just fall back to a plain Alphabet behaving as if all levels share the same alphabet. This would let us leave the current Alphabet implementation untouched.

Does this sound like a reasonable direction in C++, or is there a downside I'm missing?

As for the reverse_translate_symbol function, I also found it confusing when I first started working with Mata. I would name it something like name_of(symbol) or symbol_name(symbol).

Also sorry for my late response, I was focusing on other projects

Adda0 · 2026-04-24T06:13:13Z

One idea crossed my mind about handling the levels: what if we introduce another abstract class, LevelAlphabet, deriving from Alphabet, and provide default implementations such that anything implementing Alphabet is also automatically a valid LevelAlphabet?

That way, NFA would keep using Alphabet as usual, while NFT would use LevelAlphabet, which could either be a specialized implementation, or just fall back to a plain Alphabet behaving as if all levels share the same alphabet. This would let us leave the current Alphabet implementation untouched.

Does this sound like a reasonable direction in C++, or is there a downside I'm missing?

As for the reverse_translate_symbol function, I also found it confusing when I first started working with Mata. I would name it something like name_of(symbol) or symbol_name(symbol).

Also sorry for my late response, I was focusing on other projects

This sounds great, but how would you want to implement this? LevelAlphabet is just a vector of alphabets. Maybe the wrong approach is making it derive from Alphabet, and only having it as a vector of alphabets that is a member of NFTs (or its deriving classes). If the vector is empty, it is as if no alphabet is set. If it contains only one alphabet, it is used for all tapes. If it contains a vector of alphabets of size num_of_levels, then it is a level alphabet with one alphabet per level. This way, simple alphabets still exist, and do not have to bother with levels, and specialised level alphabets exist as well.

tmokenc · 2026-04-24T08:53:00Z

My idea was something like this

class LevelAlphabet : public Alphabet {
public:
    ~LevelAlphabet() override = default;

    // --- Level-aware interface ------------------------------------------
    // Defaults: ignore `level`, defer to the flat Alphabet API.

    virtual Symbol translate_symb_at_level(const std::string& symb,
                                           Level /*level*/) {
        return this->translate_symb(symb);
    }

    virtual std::string reverse_translate_symbol_at_level(Symbol symbol,
                                                          Level /*level*/) const {
        return this->reverse_translate_symbol(symbol);
    }

    virtual utils::OrdVector<Symbol>
    get_alphabet_symbols_at_level(Level /*level*/) const {
        return this->get_alphabet_symbols();
    }

    virtual utils::OrdVector<Symbol>
    get_complement_at_level(const utils::OrdVector<Symbol>& symbols,
                            Level /*level*/) const {
        return this->get_complement(symbols);
    }

    virtual bool empty_at_level(Level /*level*/) const {
        return this->empty();
    }
};

Then we provide an adapter over a regular Alphabet, something like SharedLevelAlphabet, that keeps the default "all levels share one alphabet" behavior. On top of that, we can add genuinely level-aware implementations under their own names (maybe VectorAlphabet for a per-level std::vector<Alphabet*>).

When constructing an NFT, we allow it to put in either Alphabet (then we wrap it into the adapter inside the constructor) or LevelAlphabet.

The advantage of this over storing a std::vector<Alphabet*> directly inside NFT is that callers always go through a single LevelAlphabet*, so we avoid the "is this transducer single-alphabet or multi-alphabet?" if/else branches at every query, and we leave the door open for different backing representations chosen for different optimization (possibly be vectorization, matrix-based, etc.). Or maybe it just my imagination/overthinking...

The only real concern I see is that SharedLevelAlphabet introduces one extra level of indirection on each query compared to calling the underlying Alphabet directly. I suspect a modern compiler can devirtualize and inline it away in most cases, since they are pretty smart nowaday, but that's a guess, it would need to be measured.

edit: of course the nft constructor can, in case of single Alphabet, create a vector containing all same pointer to that one alphabet, this would eliminate the if else branch I mentioned above, so maybe the real question is that will in the future have more leveled alphabet representation or not, or simply just leave it to when that future comes and we go for simplicity first.

Adda0 · 2026-04-24T10:27:34Z

Just to clarify, I meant the std::vector<Alphabet*> to be inside a class LevelAlphabet, which is a member of an NFT as Alphabet*, not directly as std::vector<Alphabet*>. We definitely need to share the LevelAlphabet configuration between multiple NFTs.

It is an intriguing idea indeed. However, this approach has issues with inheritance. NFTs derive from NFAs, so each NFT has an Alphabet* member. This means we would like to force this level alphabet to also inherit from Alphabet so that we can use said member. However, using this member means we will not see the level interface unless we explicitly typecast it to LevelAlphabet* at every use. The inheritance between NFTs and NFAs in general has been a mistake, I think. It does not add much (since we are not using design patterns such as a strategy pattern etc. with templates which are what makes the inheritance actually useful here), and only gives us problems such as this.

The idea that we modify the original Alphabet is what would make this access to the level alphabet(s) seemless. For now, I see two possible approaches:

Stop deriving NFTs from NFAs (not in this PR), and use here a new AlphabetLevels* Nft member which contains the vector of Alphabet*. Simple, straightforward, and fixes many pain points of the inheritance.
Derive AlphabetLevels from Alphabet and modify the Alphabet to also reason about levels (including getters, setters, etc.; that means implementing more functions in Alphabet such as operator[](Level level) and so on). This is a more involved approach that allows the existing inheritance between NFTs and NFAs to remain, but is quite complex and maybe even a bit stifling for future use, such as AlphabetRegisters, AlphabetCounters, ...

tmokenc · 2026-04-24T10:52:09Z

Oh my bad, sorry for the confusion.

Personally I prefer the first approach, it is straightforward and hard to gone wrong.

My only concern with the second one is that since AlphabetLevels derives from Alphabet, nothing stops someone from nesting a level alphabet inside another level alphabet. It will throw error as soon as someone try to create and run it, but it can be completely avoided by design (this was the main reason I proposed keeping LevelAlphabet separate from normal Alphabet).

Adda0 · 2026-04-24T11:07:07Z

Mata has always gone with the assumption that the user knows what they are doing, and any mistakes such as this are their issue. However, if we can make it so that the issue cannot even be created, even better.

Adda0 · 2026-04-24T11:42:53Z

I discussed this with @koniksedy, and we agreed that the way forward seems to be the following (approach 1 from above):

NFTs do not inherit from NFAs
NFAs have Alphabet* abstract class as a member, which knows nothing about levels.
NFTs have AlphabetLevels* non-abstract class as a member (named ``alphabetsinstead ofalphabet` maybe). This `AlphabetLevels` class will still work as a single alphabet if, for example, the vector contains only one element, that is, only one `Alphabet*` for the whole NFT.
Operations which are (mostly) shared between both implementations with minimal differences (such as intersection, minimize, ...) are classes that can be templated (and optionally use strategy design pattern) and therefore allow for easy modification of the algorithm, with the parts of the algorithm that differ being extracted as automaton type-specific lambdas or member functions (using C++ concepts that are directly designed for this, i.e., concepts, templates, etc.). These operation classes are then called through the currently existing mata::<automaton_model>::<operation_name> such as mata::nft::minimize (Nft& nft, ...) { return Minimization<Nft>(nft, ...).run(); } or something similar.
Adding another automaton model (like automata with registers, counters, alternating automata, ...) is much easier, and there will not be confusion about which operations are supported by which automaton models.
All alphabet members will replace the pointer with a smart pointer, std::shared_ptr preciselly.

What it means for this PR? That, probably, we should create AlphabetLevels* alphabets as a member of NFTs, and temporarily have both Alphabet* and AlphabetLevels* members before all this is resolved. Maybe. Not sure about this yet.

tmokenc · 2026-04-24T12:23:26Z

That sounds clean, I will work on this ASAP.

tmokenc · 2026-04-29T03:42:47Z

@Adda0 I've just pushed an update. Could you please check it again? There is now a standalone class AlphabetLevels that is not derived from Alphabet. The NFT should now completely ignore the underlying alphabet of the NFA and use AlphabetLevels instead.

Adda0

I am really happy with the changes. Thank you. It looks good. Only a few nits remaining, and we can merge.

Adda0 · 2026-05-13T07:55:38Z

Could you rebase onto master? I want to see whether that fixes the CI.

This is a very naive implementation, not sure if it is actually correct or I missed something. (cherry picked from commit 705db38)

… a field in Nft

Co-authored-by: David Chocholatý <chocholaty.david@protonmail.com>

…](Level)

tmokenc · 2026-05-13T16:58:20Z

I hope you mean the'devel branch, as there is no master branch here.

Adda0 · 2026-05-13T18:43:41Z

I hope you mean the'devel branch, as there is no master branch here.

Oh, yeah. Of course. My bad.

I see this as ready for merge. I might first disable auto-tagging on new merges in devel, so it might take a bit, but otherwise I think we are done here for the time being. Next for alphabets is switching to using std::shared_ptr for alphabets.

tmokenc requested a review from Adda0 as a code owner April 15, 2026 23:42

Adda0 reviewed Apr 20, 2026

View reviewed changes

Adda0 reviewed Apr 24, 2026

View reviewed changes

Comment thread include/mata/nft/types.hh

Comment thread include/mata/alphabet.hh

Comment thread include/mata/alphabet.hh Outdated

Adda0 reviewed May 4, 2026

View reviewed changes

Comment thread include/mata/nft/builder.hh Outdated

Comment thread include/mata/alphabet.hh Outdated

Comment thread include/mata/alphabet.hh Outdated

Comment thread src/nft/nft.cc

Comment thread src/alphabet.cc Outdated

Adda0 reviewed May 4, 2026

View reviewed changes

Comment thread include/mata/alphabet.hh

Adda0 reviewed May 12, 2026

View reviewed changes

Comment thread include/mata/alphabet.hh

tmokenc and others added 9 commits May 13, 2026 18:48

feat: support for alphabet per tape/level

5d3e190

This is a very naive implementation, not sure if it is actually correct or I missed something. (cherry picked from commit 705db38)

refactors: move function bodies to .cc file

b83f51b

refactors: level-awar alphabet handling into LevelAlphabet instead of…

26112a7

… a field in Nft

refactor: simplify the Alphabet

8e03bb3

refactor: rename the resolve_alphabet into get_alphabet_for_level

a6463d2

fix: not deriving Alphabet for AlphabetLevels

b943a11

fix: move internal alphabets of AlphabetLevels to public

72eca68

Update include/mata/nft/builder.hh

47074cb

Co-authored-by: David Chocholatý <chocholaty.david@protonmail.com>

fix: make empty and clear take std::optional<Level>

b25d350

tmokenc added 4 commits May 13, 2026 18:48

refactor: rename alphabet_of_level to for_level and add operator[…

5d6abcf

…](Level)

feat: Add Mode for AlphabetLevel and make methods take optional<Level>

b6d1df3

fix: semantic of clear/empty on MultiLevel mode of the AlphabetLevels

00c4322

feat: add non-const function for for_level

854aeb5

tmokenc force-pushed the nft-per-level-alphabet branch from afac0cf to 854aeb5 Compare May 13, 2026 16:56

Conversation

tmokenc commented Apr 15, 2026

Uh oh!

Adda0 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmokenc commented Apr 16, 2026

Uh oh!

tmokenc commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Adda0 Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Adda0 commented Apr 20, 2026

Uh oh!

tmokenc commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Adda0 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmokenc commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adda0 commented Apr 24, 2026

Uh oh!

tmokenc commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Adda0 commented Apr 24, 2026

Uh oh!

Adda0 commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmokenc commented Apr 24, 2026

Uh oh!

tmokenc commented Apr 29, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Adda0 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Adda0 commented May 13, 2026

Uh oh!

tmokenc commented May 13, 2026

Uh oh!

Adda0 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Adda0 commented Apr 16, 2026 •

edited

Loading

tmokenc commented Apr 17, 2026 •

edited

Loading

codecov Bot commented Apr 20, 2026 •

edited

Loading

tmokenc commented Apr 23, 2026 •

edited

Loading

Adda0 commented Apr 24, 2026 •

edited

Loading

tmokenc commented Apr 24, 2026 •

edited

Loading

tmokenc commented Apr 24, 2026 •

edited

Loading

Adda0 commented Apr 24, 2026 •

edited

Loading