Skip to content

Feature Request: Validate required input files before starting the run #47

@renegatux

Description

@renegatux

Summary

AutoRecLab currently fails mid-run with FileNotFoundError when required
dataset files are missing or have unexpected names. This happens after
several LLM iterations have already been executed — wasting time and API
costs.

Current Behavior

When a required file (e.g. u.data, VideoGames.csv) is not found,
the run crashes with a FileNotFoundError deep inside the generated code,
often only after multiple tree-search iterations.

Expected Behavior

Before any LLM calls are made, AutoRecLab should:

  1. Parse the user prompt for expected file names/datasets
  2. Check whether those files exist in the working directory
  3. Print a clear summary of which files are present and which are missing
  4. Exit immediately with a helpful error message if required files are absent

Why This Matters

This improvement would:

  • Save API costs (no wasted LLM calls before the inevitable crash)
  • Make debugging much faster for users
  • Require only a few hours to implement (a pre-run check function)

Suggested Implementation

A simple pre-flight check function called before TreeSearch is
initialized — it scans the workspace directory and compares against
files mentioned in the prompt or config.

Context

We encountered this issue repeatedly while replicating the case study
from the AutoRecLab preprint. Runs failed consistently due to missing or misnamed files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions