better support for large result lists

ir-measures was designed around iterator to allow systems to avoid needing store a ton of data in memory. In practice, though, systems that want to use this ability cannot for the following reasons:
 - Some providers end up reading the entire iterable and converting it into some other format (e.g., `pytrec_eval` converts it into a dict-of-dict structure).
 - Even for providers that progressively evaluate a run iterator (e.g., `cwleval`), this only really works when it is the only provider used AND there are not multiple invocations of the iterator. This is because in both these cases, the providers/invocations are executed in sequence, so the iterator de factor needs to be fully stored (even though the iterator tee function makes it look like it's not).

This motivates a restructuring of ir-measures to use the `send` function to push values to iterators. The existing APIs can remain unchanged and simply use this alternative structure under the hood.

The public, low-level API could look like this:
```python
results = []
results.append(scorer.score_query(all_results_for_query_1))
results.append(scorer.score_query(all_results_for_query_2))
results.append(scorer.score_query(all_results_for_query_2)) # would throw error if same query encountered again
results.append(scorer.finish()) # gives scores for queries that appear in qrels but had no run submitted
```

Under the hood, it could look like this:
```python
# outer loop
class Scorer:
    def score_query(self, run):
        results = []
        for scorer in self.provider_scorers:
            results.append(scorer.send(run))
        return results
    def finish(self):
        # return default results for queries not encountered

# implementation for calc_iter
def calc_iter(measures, qrels, run):
    scorer = Scorer(measures, qrels)
    # coerce run into iterable over queries from e.g., dataframes, etc.
    for query in run:
        yield from scorer.score_query(query)
    yield from scorer.finish()

# in provider
def scorer(...):
    run = yield
    while run:
        # do whatever is needed to score this query
        run = yield metrics_for_this_query
```

This has the added side benefit of having all measures for a single query yielded in one bunch. It also means that each provider will know exactly the format that the run comes in as (what should this format be?) and only need to convert from that one format to whatever it needs.

Note that some providers (e.g., `pytrec_eval`) will need to be updated such that they submit only one query at a time. I do not expect this to be very challenging.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

better support for large result lists #23

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

better support for large result lists #23

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions