An open reinforcement-learning (RL) environment that trains LLM agents to use the current fact, not the stale one — verifiable reward for temporal fact-currency, built on verifiers / prime-rl (GRPO, LoRA).
nlp benchmark machine-learning reinforcement-learning memory rl lora fine-tuning rl-environment llm rlhf vllm long-horizon qwen llm-evaluation llm-agents agentic-ai grpo verifiers prime-rl
-
Updated
Jun 24, 2026 - Python