Skip to content

feat: add support for per-level alphabet in NFT#618

Open
tmokenc wants to merge 13 commits into
VeriFIT:develfrom
tmokenc:nft-per-level-alphabet
Open

feat: add support for per-level alphabet in NFT#618
tmokenc wants to merge 13 commits into
VeriFIT:develfrom
tmokenc:nft-per-level-alphabet

Conversation

@tmokenc
Copy link
Copy Markdown

@tmokenc tmokenc commented Apr 15, 2026

This PR adds support for per-level alphabets in mata::nft::Nft.

The main addition is:

  • std::vector<Alphabet*> level_alphabets

with the following semantics:

  • in case the vector is empty, Nft can fallback to use the inherited nfa::Nfa::alphabet
  • if different levels use different alphabets, Nft::alphabet is set to nullptr, and the per-level alphabets are stored in level_alphabets
  • When all levels share the same alphabet, Nft can still expose a shared alphabet through the inherited Nfa::Nfa field. (not sure if this is necessary)

The PR also adds two helper functions:

  • alphabet_of_level returns the alphabet for a given level and falls back to the inherited Nfa alphabet when no level-specific alphabet is available
  • set_level_alphabets to assign or update alphabets

This PR does not yet introduce any changes to NFT operations or to the .mata format for NFTs. The .mata format part likely deserves a separate discussion.

@tmokenc tmokenc requested a review from Adda0 as a code owner April 15, 2026 23:42
@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 16, 2026

I think it would be cleaner if we had a single Alphabet* alphabet argument, as until now, where one Alphabet class implementation could be called LevelAlphabet, and this alphabet would then contain the vector of alphabets.

The abstract alphabet would probably have to allow for translating optionally on a specific level, so Alphabet::translate_symb(Symbol symbol, std::optional<Level> level = std::nullopt), and all alphabets except the level one would ignore the level (and translate the symbol as normal).

Most, if not all, of the additional functions here would then be unnecessary.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 16, 2026

I agree that it would be cleaner, reduce a lot of the confusion and potentially be better for the serialisation/deserialisation of the .mata format later on. I will work on it as soon as possible.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 17, 2026

@Adda0 Please take a look. I refactored the code to use LevelAlphabet and moved the per-level alphabet logic into it. I also added level-aware functions to Alphabet. For normal alphabets, these simply ignore the level and use their non-level counterparts.
This makes it possible for NFTs to use a regular alphabet as though all levels shared the same one. When different levels need different alphabets, they can use LevelAlphabet.

This is my first C++ project, so I may have missed some conventions or done something in a way that is uncommon in C++.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 35.39823% with 73 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.65%. Comparing base (010ee52) to head (854aeb5).
⚠️ Report is 22 commits behind head on devel.

Files with missing lines Patch % Lines
src/nft/builder.cc 0.00% 41 Missing ⚠️
src/nft/nft.cc 20.83% 17 Missing and 2 partials ⚠️
src/alphabet.cc 58.06% 10 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel     #618      +/-   ##
==========================================
- Coverage   72.91%   72.65%   -0.26%     
==========================================
  Files          45       45              
  Lines        6796     7289     +493     
  Branches     1538     1646     +108     
==========================================
+ Hits         4955     5296     +341     
- Misses       1227     1347     +120     
- Partials      614      646      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread include/mata/alphabet.hh Outdated
Comment on lines +70 to +73
virtual std::string reverse_translate_symbol(Symbol symbol, size_t level) const {
(void)level;
return reverse_translate_symbol(symbol);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: We definitely need to rename these monstrosities when we are refactoring alphabets. It is awful to work with this in a function call, e.g., something like minimize(nfa, nfa->alphabet.reverse_translate_symbol(symbol), true)...

Comment thread include/mata/alphabet.hh Outdated
Comment thread include/mata/alphabet.hh Outdated
Comment thread include/mata/alphabet.hh Outdated
@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 20, 2026

It looks like you can ignore the failing CI actions. Not related.

The approach and the interface look good to me overall. It is a schame we cannot hide the invalid operations (with/without levels) for specific alphabet types without something like std::variant etc. Maybe even exploring something like std::variant might be worth it... I have not thought about it yet, though.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 23, 2026

One idea crossed my mind about handling the levels: what if we introduce another abstract class, LevelAlphabet, deriving from Alphabet, and provide default implementations such that anything implementing Alphabet is also automatically a valid LevelAlphabet?

That way, NFA would keep using Alphabet as usual, while NFT would use LevelAlphabet, which could either be a specialized implementation, or just fall back to a plain Alphabet behaving as if all levels share the same alphabet. This would let us leave the current Alphabet implementation untouched.

Does this sound like a reasonable direction in C++, or is there a downside I'm missing?

As for the reverse_translate_symbol function, I also found it confusing when I first started working with Mata. I would name it something like name_of(symbol) or symbol_name(symbol).

Also sorry for my late response, I was focusing on other projects

Comment thread include/mata/nft/types.hh
Comment thread include/mata/alphabet.hh
Comment thread include/mata/alphabet.hh Outdated
@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 24, 2026

One idea crossed my mind about handling the levels: what if we introduce another abstract class, LevelAlphabet, deriving from Alphabet, and provide default implementations such that anything implementing Alphabet is also automatically a valid LevelAlphabet?

That way, NFA would keep using Alphabet as usual, while NFT would use LevelAlphabet, which could either be a specialized implementation, or just fall back to a plain Alphabet behaving as if all levels share the same alphabet. This would let us leave the current Alphabet implementation untouched.

Does this sound like a reasonable direction in C++, or is there a downside I'm missing?

As for the reverse_translate_symbol function, I also found it confusing when I first started working with Mata. I would name it something like name_of(symbol) or symbol_name(symbol).

Also sorry for my late response, I was focusing on other projects

This sounds great, but how would you want to implement this? LevelAlphabet is just a vector of alphabets. Maybe the wrong approach is making it derive from Alphabet, and only having it as a vector of alphabets that is a member of NFTs (or its deriving classes). If the vector is empty, it is as if no alphabet is set. If it contains only one alphabet, it is used for all tapes. If it contains a vector of alphabets of size num_of_levels, then it is a level alphabet with one alphabet per level. This way, simple alphabets still exist, and do not have to bother with levels, and specialised level alphabets exist as well.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 24, 2026

My idea was something like this

class LevelAlphabet : public Alphabet {
public:
    ~LevelAlphabet() override = default;

    // --- Level-aware interface ------------------------------------------
    // Defaults: ignore `level`, defer to the flat Alphabet API.

    virtual Symbol translate_symb_at_level(const std::string& symb,
                                           Level /*level*/) {
        return this->translate_symb(symb);
    }

    virtual std::string reverse_translate_symbol_at_level(Symbol symbol,
                                                          Level /*level*/) const {
        return this->reverse_translate_symbol(symbol);
    }

    virtual utils::OrdVector<Symbol>
    get_alphabet_symbols_at_level(Level /*level*/) const {
        return this->get_alphabet_symbols();
    }

    virtual utils::OrdVector<Symbol>
    get_complement_at_level(const utils::OrdVector<Symbol>& symbols,
                            Level /*level*/) const {
        return this->get_complement(symbols);
    }

    virtual bool empty_at_level(Level /*level*/) const {
        return this->empty();
    }
};

Then we provide an adapter over a regular Alphabet, something like SharedLevelAlphabet, that keeps the default "all levels share one alphabet" behavior. On top of that, we can add genuinely level-aware implementations under their own names (maybe VectorAlphabet for a per-level std::vector<Alphabet*>).

When constructing an NFT, we allow it to put in either Alphabet (then we wrap it into the adapter inside the constructor) or LevelAlphabet.

The advantage of this over storing a std::vector<Alphabet*> directly inside NFT is that callers always go through a single LevelAlphabet*, so we avoid the "is this transducer single-alphabet or multi-alphabet?" if/else branches at every query, and we leave the door open for different backing representations chosen for different optimization (possibly be vectorization, matrix-based, etc.). Or maybe it just my imagination/overthinking...

The only real concern I see is that SharedLevelAlphabet introduces one extra level of indirection on each query compared to calling the underlying Alphabet directly. I suspect a modern compiler can devirtualize and inline it away in most cases, since they are pretty smart nowaday, but that's a guess, it would need to be measured.

edit: of course the nft constructor can, in case of single Alphabet, create a vector containing all same pointer to that one alphabet, this would eliminate the if else branch I mentioned above, so maybe the real question is that will in the future have more leveled alphabet representation or not, or simply just leave it to when that future comes and we go for simplicity first.

@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 24, 2026

Just to clarify, I meant the std::vector<Alphabet*> to be inside a class LevelAlphabet, which is a member of an NFT as Alphabet*, not directly as std::vector<Alphabet*>. We definitely need to share the LevelAlphabet configuration between multiple NFTs.

It is an intriguing idea indeed. However, this approach has issues with inheritance. NFTs derive from NFAs, so each NFT has an Alphabet* member. This means we would like to force this level alphabet to also inherit from Alphabet so that we can use said member. However, using this member means we will not see the level interface unless we explicitly typecast it to LevelAlphabet* at every use. The inheritance between NFTs and NFAs in general has been a mistake, I think. It does not add much (since we are not using design patterns such as a strategy pattern etc. with templates which are what makes the inheritance actually useful here), and only gives us problems such as this.

The idea that we modify the original Alphabet is what would make this access to the level alphabet(s) seemless. For now, I see two possible approaches:

  1. Stop deriving NFTs from NFAs (not in this PR), and use here a new AlphabetLevels* Nft member which contains the vector of Alphabet*. Simple, straightforward, and fixes many pain points of the inheritance.
  2. Derive AlphabetLevels from Alphabet and modify the Alphabet to also reason about levels (including getters, setters, etc.; that means implementing more functions in Alphabet such as operator[](Level level) and so on). This is a more involved approach that allows the existing inheritance between NFTs and NFAs to remain, but is quite complex and maybe even a bit stifling for future use, such as AlphabetRegisters, AlphabetCounters, ...

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 24, 2026

Oh my bad, sorry for the confusion.

Personally I prefer the first approach, it is straightforward and hard to gone wrong.

My only concern with the second one is that since AlphabetLevels derives from Alphabet, nothing stops someone from nesting a level alphabet inside another level alphabet. It will throw error as soon as someone try to create and run it, but it can be completely avoided by design (this was the main reason I proposed keeping LevelAlphabet separate from normal Alphabet).

@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 24, 2026

Mata has always gone with the assumption that the user knows what they are doing, and any mistakes such as this are their issue. However, if we can make it so that the issue cannot even be created, even better.

@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented Apr 24, 2026

I discussed this with @koniksedy, and we agreed that the way forward seems to be the following (approach 1 from above):

  • NFTs do not inherit from NFAs
  • NFAs have Alphabet* abstract class as a member, which knows nothing about levels.
  • NFTs have AlphabetLevels* non-abstract class as a member (named ``alphabetsinstead ofalphabet` maybe). This `AlphabetLevels` class will still work as a single alphabet if, for example, the vector contains only one element, that is, only one `Alphabet*` for the whole NFT.
  • Operations which are (mostly) shared between both implementations with minimal differences (such as intersection, minimize, ...) are classes that can be templated (and optionally use strategy design pattern) and therefore allow for easy modification of the algorithm, with the parts of the algorithm that differ being extracted as automaton type-specific lambdas or member functions (using C++ concepts that are directly designed for this, i.e., concepts, templates, etc.). These operation classes are then called through the currently existing mata::<automaton_model>::<operation_name> such as mata::nft::minimize (Nft& nft, ...) { return Minimization<Nft>(nft, ...).run(); } or something similar.
  • Adding another automaton model (like automata with registers, counters, alternating automata, ...) is much easier, and there will not be confusion about which operations are supported by which automaton models.
  • All alphabet members will replace the pointer with a smart pointer, std::shared_ptr preciselly.

What it means for this PR? That, probably, we should create AlphabetLevels* alphabets as a member of NFTs, and temporarily have both Alphabet* and AlphabetLevels* members before all this is resolved. Maybe. Not sure about this yet.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 24, 2026

That sounds clean, I will work on this ASAP.

@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented Apr 29, 2026

@Adda0 I've just pushed an update. Could you please check it again? There is now a standalone class AlphabetLevels that is not derived from Alphabet. The NFT should now completely ignore the underlying alphabet of the NFA and use AlphabetLevels instead.

Comment thread include/mata/nft/builder.hh Outdated
Comment thread include/mata/alphabet.hh Outdated
Comment thread include/mata/alphabet.hh Outdated
Comment thread src/nft/nft.cc
Comment thread src/alphabet.cc Outdated
Comment thread include/mata/alphabet.hh
Copy link
Copy Markdown
Collaborator

@Adda0 Adda0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really happy with the changes. Thank you. It looks good. Only a few nits remaining, and we can merge.

Comment thread include/mata/alphabet.hh
@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented May 13, 2026

Could you rebase onto master? I want to see whether that fixes the CI.

@tmokenc tmokenc force-pushed the nft-per-level-alphabet branch from afac0cf to 854aeb5 Compare May 13, 2026 16:56
@tmokenc
Copy link
Copy Markdown
Author

tmokenc commented May 13, 2026

I hope you mean the'devel branch, as there is no master branch here.

@Adda0
Copy link
Copy Markdown
Collaborator

Adda0 commented May 13, 2026

I hope you mean the'devel branch, as there is no master branch here.

Oh, yeah. Of course. My bad.

I see this as ready for merge. I might first disable auto-tagging on new merges in devel, so it might take a bit, but otherwise I think we are done here for the time being. Next for alphabets is switching to using std::shared_ptr for alphabets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants