Allow disabling per-function memoization cache to reduce tracking table storage

Thank you for building CocoIndex — we're using it as the foundation for an enterprise code search platform and the incremental processing model is excellent.

### Problem

We index ~18 source code repositories (2.4M chunks total) using `SentenceTransformerEmbed`. Our tracking tables have grown to 17 GB — larger than the actual vector data table (16 GB). About 90% of the tracking table size is cached embedding vectors stored as JSON in `memoization_info.cache`.

We're scaling toward 200 sources. At that scale, the tracking tables alone would consume roughly 190 GB, essentially doubling our total database size for data that already exists in the target table.

### Why we think disabling the cache would work fine for us

CocoIndex's source fingerprinting (`processed_source_fp`) already handles the normal operations perfectly — unchanged files are skipped entirely without consulting the memoization cache. Modified and added files need to be re-embedded regardless. The cache is only valuable when the processing logic fingerprint changes (e.g., embedding model upgrade), but we change models very rarely and would do a full re-index in that case anyway.

### Proposal

Allow `enable_cache` to be configurable at the function spec level, defaulting to `true` for backward compatibility:

```python
# Current behavior (unchanged)
text.transform(
    cocoindex.functions.SentenceTransformerEmbed(model="all-MiniLM-L6-v2")
)

# Opt out of caching to save storage
text.transform(
    cocoindex.functions.SentenceTransformerEmbed(
        model="all-MiniLM-L6-v2",
        enable_cache=False,
    )
)
```

A flow-level or global setting would also work for our use case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow disabling per-function memoization cache to reduce tracking table storage #1779

Problem

Why we think disabling the cache would work fine for us

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow disabling per-function memoization cache to reduce tracking table storage #1779

Description

Problem

Why we think disabling the cache would work fine for us

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions