Skip to content

better support for large result lists #23

Description

@seanmacavaney

ir-measures was designed around iterator to allow systems to avoid needing store a ton of data in memory. In practice, though, systems that want to use this ability cannot for the following reasons:

  • Some providers end up reading the entire iterable and converting it into some other format (e.g., pytrec_eval converts it into a dict-of-dict structure).
  • Even for providers that progressively evaluate a run iterator (e.g., cwleval), this only really works when it is the only provider used AND there are not multiple invocations of the iterator. This is because in both these cases, the providers/invocations are executed in sequence, so the iterator de factor needs to be fully stored (even though the iterator tee function makes it look like it's not).

This motivates a restructuring of ir-measures to use the send function to push values to iterators. The existing APIs can remain unchanged and simply use this alternative structure under the hood.

The public, low-level API could look like this:

results = []
results.append(scorer.score_query(all_results_for_query_1))
results.append(scorer.score_query(all_results_for_query_2))
results.append(scorer.score_query(all_results_for_query_2)) # would throw error if same query encountered again
results.append(scorer.finish()) # gives scores for queries that appear in qrels but had no run submitted

Under the hood, it could look like this:

# outer loop
class Scorer:
    def score_query(self, run):
        results = []
        for scorer in self.provider_scorers:
            results.append(scorer.send(run))
        return results
    def finish(self):
        # return default results for queries not encountered

# implementation for calc_iter
def calc_iter(measures, qrels, run):
    scorer = Scorer(measures, qrels)
    # coerce run into iterable over queries from e.g., dataframes, etc.
    for query in run:
        yield from scorer.score_query(query)
    yield from scorer.finish()

# in provider
def scorer(...):
    run = yield
    while run:
        # do whatever is needed to score this query
        run = yield metrics_for_this_query

This has the added side benefit of having all measures for a single query yielded in one bunch. It also means that each provider will know exactly the format that the run comes in as (what should this format be?) and only need to convert from that one format to whatever it needs.

Note that some providers (e.g., pytrec_eval) will need to be updated such that they submit only one query at a time. I do not expect this to be very challenging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions