Skip to content

[ENHANCEMENT]: Add Output Validation for LLM-generated Question Responses #656

@piyush06singhal

Description

@piyush06singhal

Feature and its Use Cases

Problem

LLM-based endpoints (/get_mcq_llm, /get_boolq_llm, /get_shortq_llm, /get_problems_llm) return generated questions without validating their structure or content.

As a result, the API may return HTTP 200 responses containing:

  • Missing required fields (e.g., options, answer)
  • Empty or malformed values
  • Incomplete question sets
  • Structurally inconsistent data

These issues are not detected before sending the response, leading to silent failures and unreliable outputs.


Why This Matters

  • Frontend may break due to undefined fields
  • Users receive incomplete or unusable quizzes
  • Debugging becomes difficult due to lack of validation
  • Reduces overall reliability of LLM-based features

Proposed Solution

Introduce a lightweight output validation layer for LLM-generated responses.

This layer should validate each generated question before it is returned by the API.


Validation Rules

MCQ

  • question: non-empty string
  • options: non-empty list
  • correct_answer: non-empty

Short Questions

  • question: non-empty string
  • answer: non-empty string

Boolean Questions

  • question: non-empty string
  • answer: must be True or False

Expected Behavior

  • Invalid questions are filtered out
  • API responses contain only valid data
  • No malformed or incomplete structures are returned
  • Existing API behavior remains unchanged

Scope

In Scope

  • Validation of LLM-generated outputs
  • Integration after parsing and before API response

Out of Scope

  • Retry logic
  • Pipeline/architecture changes
  • Frontend changes
  • Advanced schema validation systems

Suggested Implementation

  • Add a new module: backend/Generator/output_validator.py
  • Implement simple validation functions per question type
  • Integrate validation inside llm_generator.py before returning responses

Impact

  • Improves reliability of LLM endpoints
  • Prevents silent failures
  • Ensures consistent API contracts
  • Enhances user experience

Additional Context

Additional Context

Input/request validation exists in parts of the system, but LLM output validation is currently missing, creating a gap in response correctness.

Code of Conduct

  • I have joined the Discord server and will post updates there
  • I have searched existing issues to avoid duplicates

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions