- Make sure the compiler optimizes the serialization of datatype members to the vector of bytes.
- Make sure everything is as much compile-time optimized as possible (maybe rewrite functions f.e. make mod calculations constexpr).
- Is the array of displacements in the Pattern class resolved at compile-time? Maybe another construct would be more fitting. In theory all the information is there at compile-time.
- Consider the overlapping the sending and splitting of large messages.
- Find out OpenMPIs thresholds for that
- Change of interface is probably necessary (currently data is packed completed until the communicator does something with it).
- Look at other serialization libraries
- boost
- madness
- cereal
- Make custom benchmarks "nicer".
- Think about more restricting concepts and specializations (f.e. consecutive and non-consecutive memory containers).
- Give option to give the communicator a buffer that it can use for the memory pool.
- Look at Rust type interface stuff?
- Link-time optimization?
- Look at other C++ MPI implementations: EnhancedMPI, ChronosMPI.
- Read Josephs paper from last year about derived datatypes.
- Change the "include" directory name to mppi, to be more expressive.
- MPI_CHAR vs MPI_BYTE
- Check for max bandwidth of 1 core (with stream?). How much of that do we get?
- Compile-time send size feature?
- (CUDA/HIP/)OpenMP device support?
- Implement reflection pattern with experimental clang compiler.