Skip to content

Add test-time scaling strategies like resampling and ensembling #200

Description

@liana313

Problem Statement

When running operations like sem_filter or sem_map, I would like to be able to specify a n_sample parameter and ensembling strategy

Proposed Solution

Allow users to specify a test-time-scaling strategy and ensembling strategy per operator. For example

df.sem_filter("the {abstract} is relevant to vector databases", n_sample=3, ensemble='majority_vote', temperature=1.0)

Use Cases

When I perform a sem_filter (eg "is this row relevant to XX"), the results often change for repeated trials, so it would help to have built in functionally

Checklist

  • [ x] I have searched existing issues to avoid duplicates
  • [ x] I have provided a clear problem statement
  • [ x] I have considered alternative solutions
  • [ x] I have assessed the impact and priority
  • [ x] I am willing to contribute to implementation (if applicable)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions