Skip to content

Research: convey discovered redirects from Watcher → Archiver (effective_url workflow) #157

@gregoryfoster

Description

@gregoryfoster

Context

Today Watcher stores effective_url on the watches row — the canonical URL after following redirects from the user-supplied URL. After Archiver v2 (see #156), the canonical URL lives on information.info_sources.url, computed at InfoSource creation time and treated as effectively immutable.

But Watcher is the first service to observe when a remote target starts redirecting somewhere new — it's the one actually making the HTTP requests on a schedule. Archiver has no fetch loop and will not learn this on its own.

Question

What workflow conveys discovered-redirect information back to Archiver, and what semantics does it carry?

  • Is a new redirect target a new InfoSource (operator-curated), or a URL revision of an existing InfoSource (Watcher-detected)?
  • If the latter, the design's "URL is immutable" stance breaks — needs a relaxation or a new entity.
  • If the former, Watcher needs to surface "this Watch's target now redirects to a URL not yet curated" as an operator alert, not auto-create.
  • Hash-verify implications: the previously-recorded fingerprint is still valid for the previous URL; the new URL produces a new fingerprint. How are these linked for content-history continuity?

Scope

Research + design doc. No implementation in this issue.

Linked work

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions