A GPU cluster manager for high-performance AI model serving (vLLM, SGLang) and on-demand SSH-accessible GPU instances.
-
Updated
Jul 1, 2026 - Python
A GPU cluster manager for high-performance AI model serving (vLLM, SGLang) and on-demand SSH-accessible GPU instances.
Krasis is a Hybrid LLM runtime which focuses on efficient running of larger models on consumer grade VRAM limited hardware
Optimize TensorFlow (TF) Models For Deployment with NVIDIA TensorRT.
Add a description, image, and links to the high-performance-inference topic page so that developers can more easily learn about it.
To associate your repository with the high-performance-inference topic, visit your repo's landing page and select "manage topics."