Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
173 changes: 173 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
# Consent-First Meeting Ops Agent RL Prototype

This repository contains a hackathon-ready prototype for a **hireEZ-aligned Customer Success Engineer solution** built around an **RL environment + agent loop** rather than a pure prompt demo.

## Why this concept fits the role

The target role is closer to **implementation, integration, debugging, and production-minded automation** than to speculative "emotion AI". The proposed solution is a:

> **Consent-First Meeting Ops Agent**

An agent that learns to turn meeting signals into approved CRM + Slack actions while optimizing for:

- CRM hygiene
- low-risk automation
- human approval compliance
- minimal rep disruption
- observable business outcomes

This mirrors how a Customer Success Engineer would think about customer deployments:

- integration adapters
- validation and error handling
- auditability
- human-in-the-loop workflows
- measurable operational outcomes

## Hackathon framing

This project addresses the Sundai theme:

- **RL environments + agents**
- **agentic harnesses**
- **self-improving tool use**
- **custom reward function for a realistic workflow**

Instead of training an agent to "detect emotions," this repo defines a small environment where the agent learns the right actions across a realistic revenue-ops workflow.

## Core idea

A meeting arrives with structured but imperfect signals:

- missing next step
- unclear stakeholder coverage
- weak confidence in extracted fields
- consent on/off
- risk of unnecessary automation

The agent must choose actions such as:

- extract structured fields
- ask for approval
- update CRM draft
- post Slack summary
- request clarification
- stay quiet in-call

The reward function encourages behavior that is:

- execution-oriented
- safe under uncertainty
- human-supervised when confidence is low
- useful for downstream systems

## Environment design

### State

Each episode simulates one meeting/workflow state with features like:

- `has_next_step`
- `has_stakeholder`
- `high_value_opportunity`
- `consent_granted`
- `confidence`
- `manager_requested_update`
- `rep_is_busy`
- `customer_risk_flag`

### Actions

The agent can choose one of these actions:

1. `wait`
2. `extract_fields`
3. `request_clarification`
4. `ask_human_approval`
5. `write_crm`
6. `post_slack_update`
7. `show_quiet_nudge`

### Reward intuition

The reward function favors:

- requesting clarification before automation when fields are incomplete
- asking for human approval before writing low-confidence updates
- writing to CRM only when consent and confidence make it appropriate
- Slack updates when they help coordination
- quiet nudges only when the rep is busy and a key item is missing

The reward function penalizes:

- writing to CRM without consent
- distracting the rep with unnecessary nudges
- posting weak or noisy updates
- skipping important workflow steps on high-value deals

## Why this is stronger than an "emotion detection" pitch

This is intentionally aligned with enterprise reality:

- **safer governance posture**
- **clearer reward definition**
- **closer to deployment work**
- **better fit for hireEZ's semi-autonomous, human-supervised agent model**

It also maps more directly to a CSE interview narrative:

- API adapters
- validation logic
- human approval gates
- logs and retries
- measurable workflow gains

## Files

- `meeting_ops_env.py` — RL environment, rule-based training harness, and evaluation
- `tests/test_meeting_ops_env.py` — lightweight correctness tests for environment transitions and reward behavior

## How to run

### Run the demo policy training loop

```bash
python3 meeting_ops_env.py
```

### Run tests

```bash
python3 -m unittest discover -s tests -p 'test_*.py'
```

## Demo talking points for presentation

### 1. Problem statement

"I built an RL environment for a Meeting Ops Agent that learns when to extract, clarify, escalate, approve, and update systems of record like CRM and Slack."

### 2. Why RL here

"This is a sequential decision problem with competing objectives: speed, accuracy, consent, and rep experience. RL is a better fit than a single prompt because the agent must learn trade-offs across multiple steps."

### 3. Why this fits hireEZ

"hireEZ's product model is semi-autonomous and workflow-native. My environment trains an agent to behave that way: human-supervised, tool-using, and integration-aware."

### 4. Why it fits a CSE role

"I biased the design toward implementation realism: approval gates, low-confidence handling, data quality checks, and execution into operational systems."

## Next steps if you extend this during the hackathon

1. Add a `SalesforceAdapter` mock with idempotency keys.
2. Add a `SlackAdapter` mock with routing rules.
3. Log every transition as JSON lines for debugging.
4. Add offline replay from real or synthetic transcripts.
5. Train with Q-learning or PPO against a larger synthetic episode generator.
6. Add a front-end dashboard showing state → action → reward → human approval.

## Suggested final demo line

> "I did not build a surveillance demo. I built a governed execution agent that learns how to improve revenue operations safely under uncertainty."
217 changes: 217 additions & 0 deletions meeting_ops_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,217 @@
from __future__ import annotations

from dataclasses import dataclass
from enum import IntEnum
import random
from typing import Dict, List, Tuple


class Action(IntEnum):
WAIT = 0
EXTRACT_FIELDS = 1
REQUEST_CLARIFICATION = 2
ASK_HUMAN_APPROVAL = 3
WRITE_CRM = 4
POST_SLACK_UPDATE = 5
SHOW_QUIET_NUDGE = 6


@dataclass(frozen=True)
class MeetingState:
has_next_step: bool
has_stakeholder: bool
high_value_opportunity: bool
consent_granted: bool
confidence_bucket: int
manager_requested_update: bool
rep_is_busy: bool
customer_risk_flag: bool
extracted_fields: bool = False
clarification_requested: bool = False
approval_requested: bool = False
crm_written: bool = False
slack_posted: bool = False
quiet_nudge_shown: bool = False
done: bool = False

def as_key(self) -> Tuple[int, ...]:
return (
int(self.has_next_step),
int(self.has_stakeholder),
int(self.high_value_opportunity),
int(self.consent_granted),
int(self.confidence_bucket),
int(self.manager_requested_update),
int(self.rep_is_busy),
int(self.customer_risk_flag),
int(self.extracted_fields),
int(self.clarification_requested),
int(self.approval_requested),
int(self.crm_written),
int(self.slack_posted),
int(self.quiet_nudge_shown),
int(self.done),
)


class MeetingOpsEnv:
"""Small RL-style environment for a consent-first meeting ops agent."""

def __init__(self, seed: int = 7):
self._rng = random.Random(seed)
self.state = self.reset()

def reset(self) -> MeetingState:
self.state = MeetingState(
has_next_step=self._rng.choice([True, False]),
has_stakeholder=self._rng.choice([True, False]),
high_value_opportunity=self._rng.choice([True, False]),
consent_granted=self._rng.choice([True, False]),
confidence_bucket=self._rng.randint(0, 2),
manager_requested_update=self._rng.choice([True, False]),
rep_is_busy=self._rng.choice([True, False]),
customer_risk_flag=self._rng.choice([True, False]),
)
return self.state

def step(self, action: Action) -> Tuple[MeetingState, float, bool, Dict[str, str]]:
state = self.state
if state.done:
return state, 0.0, True, {"message": "Episode already completed."}

reward = -0.1
message = "No-op"

if action == Action.WAIT:
reward += self._reward_wait(state)
next_state = MeetingState(**{**state.__dict__, "done": True})
message = "Agent waited."

elif action == Action.EXTRACT_FIELDS:
reward += 1.0 if not state.extracted_fields else -0.5
next_state = MeetingState(**{**state.__dict__, "extracted_fields": True})
message = "Structured extraction completed."

elif action == Action.REQUEST_CLARIFICATION:
base = 1.5 if (not state.has_next_step or not state.has_stakeholder) else -0.4
next_state = MeetingState(
**{
**state.__dict__,
"clarification_requested": True,
"has_next_step": True,
"has_stakeholder": True,
}
)
reward += base
message = "Clarification requested to complete missing fields."

elif action == Action.ASK_HUMAN_APPROVAL:
base = 1.2 if state.confidence_bucket < 2 or state.customer_risk_flag else 0.4
next_state = MeetingState(**{**state.__dict__, "approval_requested": True})
reward += base
message = "Human approval requested."

elif action == Action.WRITE_CRM:
reward += self._reward_write_crm(state)
next_state = MeetingState(**{**state.__dict__, "crm_written": True, "done": True})
message = "CRM write attempted."

elif action == Action.POST_SLACK_UPDATE:
reward += self._reward_post_slack(state)
next_state = MeetingState(**{**state.__dict__, "slack_posted": True})
message = "Slack update posted."

elif action == Action.SHOW_QUIET_NUDGE:
reward += self._reward_quiet_nudge(state)
next_state = MeetingState(**{**state.__dict__, "quiet_nudge_shown": True})
message = "Quiet in-call nudge shown."

else:
raise ValueError(f"Unknown action: {action}")

self.state = next_state
return next_state, reward, next_state.done, {"message": message}

def _reward_wait(self, state: MeetingState) -> float:
if state.high_value_opportunity and not state.crm_written:
return -1.0
if state.rep_is_busy and state.has_next_step and state.has_stakeholder:
return 0.3
return -0.2

def _reward_write_crm(self, state: MeetingState) -> float:
if not state.consent_granted:
return -4.0
if not state.has_next_step or not state.has_stakeholder:
return -2.5
if state.confidence_bucket == 0 and not state.approval_requested:
return -2.0
score = 3.0
if state.approval_requested:
score += 1.0
if state.manager_requested_update:
score += 0.5
return score

def _reward_post_slack(self, state: MeetingState) -> float:
if state.manager_requested_update or state.customer_risk_flag:
return 1.5
if state.high_value_opportunity:
return 0.8
return -0.3

def _reward_quiet_nudge(self, state: MeetingState) -> float:
if state.rep_is_busy and (not state.has_next_step or not state.has_stakeholder):
return 1.3
if state.rep_is_busy:
return 0.3
return -0.8


def heuristic_policy(state: MeetingState) -> Action:
if (not state.has_next_step or not state.has_stakeholder) and not state.clarification_requested:
if state.rep_is_busy:
return Action.SHOW_QUIET_NUDGE
return Action.REQUEST_CLARIFICATION
if not state.extracted_fields:
return Action.EXTRACT_FIELDS
if (state.confidence_bucket < 2 or state.customer_risk_flag) and not state.approval_requested:
return Action.ASK_HUMAN_APPROVAL
if state.manager_requested_update and not state.slack_posted:
return Action.POST_SLACK_UPDATE
if state.consent_granted and not state.crm_written:
return Action.WRITE_CRM
return Action.WAIT


def evaluate_policy(episodes: int = 100, seed: int = 7) -> Dict[str, float]:
env = MeetingOpsEnv(seed=seed)
total_reward = 0.0
crm_writes = 0
approval_requests = 0

for _ in range(episodes):
state = env.reset()
done = False
steps = 0
while not done and steps < 8:
action = heuristic_policy(state)
state, reward, done, _ = env.step(action)
total_reward += reward
steps += 1
crm_writes += int(state.crm_written)
approval_requests += int(state.approval_requested)

return {
"episodes": float(episodes),
"avg_reward": total_reward / episodes,
"crm_write_rate": crm_writes / episodes,
"approval_request_rate": approval_requests / episodes,
}


if __name__ == "__main__":
metrics = evaluate_policy(episodes=200)
print("Consent-First Meeting Ops Agent evaluation")
for key, value in metrics.items():
print(f"- {key}: {value:.3f}")
Loading