TODOs

Make sure the compiler optimizes the serialization of datatype members to the vector of bytes.
1. Make sure everything is as much compile-time optimized as possible (maybe rewrite functions f.e. make mod calculations constexpr).
2. Is the array of displacements in the Pattern class resolved at compile-time? Maybe another construct would be more fitting. In theory all the information is there at compile-time.
Consider the overlapping the sending and splitting of large messages.
1. Find out OpenMPIs thresholds for that
2. Change of interface is probably necessary (currently data is packed completed until the communicator does something with it).
Look at other serialization libraries
1. boost
2. madness
3. cereal
Make custom benchmarks "nicer".
Think about more restricting concepts and specializations (f.e. consecutive and non-consecutive memory containers).
Give option to give the communicator a buffer that it can use for the memory pool.
Look at Rust type interface stuff?
Link-time optimization?
Look at other C++ MPI implementations: EnhancedMPI, ChronosMPI.
Read Josephs paper from last year about derived datatypes.
Change the "include" directory name to mppi, to be more expressive.
MPI_CHAR vs MPI_BYTE
Check for max bandwidth of 1 core (with stream?). How much of that do we get?
Compile-time send size feature?
(CUDA/HIP/)OpenMP device support?
Implement reflection pattern with experimental clang compiler.

Provide feedback