[Feature]: OpenGauss 2.0 — Tree-Search BFS Agent Loop, Distributed GRPO Infrastructure & TRACE Reward Masking

### Problem or Use Case

Currently, the OpenGauss agent framework utilizes a strictly linear sequential loop (`G=1`). When operating under high-friction or ambiguous environments (e.g., repository-level software debugging), standard sequential reasoning is highly susceptible to "Linear Deadlocks." If the agent encounters an unexpected tool error or initiates an early hallucination, it becomes trapped in self-reinforcing feedback loops, exhausting the maximum turn limit. This results in massive, unproductive API token expenditure and inconsistent convergence on final states.
Furthermore, to unlock modern Reinforcement Learning (RL) training paradigms, the infrastructure currently lacks native support for Group Relative advantage estimation and robust trajectory sanitization for backpropagation.

### Proposed Solution

Upgrade the core infrastructure to decouple execution from linear limits:
1. **Tree-Search BFS Architecture (`environments/agent_loop.py`):** Replace linear loops with a configurable Breadth-First Search (BFS) generation system. Allow the agent to fork into `G` parallel exploratory branches simultaneously.
2. **Low-Overhead Sandbox Isolation (`tools/environments/docker.py`):** Implement a high-speed Unix tar-pipe cloning system to provision isolated branch sandboxes instantly (guest boot times `< 1.0s`).
3. **Best-of-N Reward Alignment (`environments/gauss_base_env.py`):** Add standard multi-branch evaluation logic that automatically audits and selects the cleanest, most turn-efficient trajectory for commit.
4. **Mathematical GRPO Loss Engine (`tools/rl_training_tool.py`):** Deliver a standalone `GaussGRPOEngine` to compute Group Relative advantages ($A_i = \frac{R_i - \mu}{\sigma + \epsilon}$) and reference-aligned clipped surrogate losses natively over distributed Ray clusters.
5. **TRACE Masking Sanitizer (`agent/trace_masking.py`):** Isolate deterministic reward assignments directly to user/assistant tool blocks, pruning bulky terminal stdout noise before sequencing trajectories into memory buffers.
#### Empirical Pilot Validation
A head-to-head benchmark on the `TBLite` cohort demonstrated an empirical **12.74% overall turn reduction** and a **70% API budget savings** on complex diagnostic tasks (solving optimal paths in 18 turns vs hitting the 60-turn timeout).

### Alternatives Considered

_No response_

### Feature Type

Performance / reliability

### Scope

Large (new module or significant refactor)

### Contribution

- [x] I'd like to implement this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: OpenGauss 2.0 — Tree-Search BFS Agent Loop, Distributed GRPO Infrastructure & TRACE Reward Masking #450

Problem or Use Case

Proposed Solution

Empirical Pilot Validation

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature]: OpenGauss 2.0 — Tree-Search BFS Agent Loop, Distributed GRPO Infrastructure & TRACE Reward Masking #450

Description

Problem or Use Case

Proposed Solution

Empirical Pilot Validation

Alternatives Considered

Feature Type

Scope

Contribution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions