Skip to content

Wayback availability API returns inconsistent results for URLs with HTTPS protocol #296

@SimbaE1

Description

@SimbaE1

Bug Description

The /wayback/available API endpoint returns inconsistent results depending on whether the URL includes the HTTPS protocol prefix.

Steps to Reproduce

Query the availability API with these three URLs that point to the same content:

  1. http://archive.org/wayback/available?url=https://www.reddit.com/r/duckduckgo/
  2. http://archive.org/wayback/available?url=www.reddit.com/r/duckduckgo/
  3. http://archive.org/wayback/available?url=reddit.com/r/duckduckgo/

Expected Behavior

All three queries should return the same result since they reference the same web content.

Actual Behavior

  • Query 1 (with https://): Returns {"archived_snapshots": {}}
  • Query 2 (with www): Returns archived snapshots correctly
  • Query 3 (without www): Returns archived snapshots correctly

Impact

This inconsistency causes automated tools to incorrectly identify already-archived content as unarchived, leading to unnecessary re-archival attempts and potential rate limiting issues.

Environment

  • API endpoint: http://archive.org/wayback/available
  • Tested on: 2025-09-23
  • Affects: Programmatic archive checking tools

Suggested Fix

The API should normalize URLs internally to handle protocol variations consistently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions