Skip to content

Add Azure Blob object storage backend#297

Open
MTG-Thomas wants to merge 15 commits into
gobifrost:mainfrom
MTG-Thomas:codex/upstream-azure-blob
Open

Add Azure Blob object storage backend#297
MTG-Thomas wants to merge 15 commits into
gobifrost:mainfrom
MTG-Thomas:codex/upstream-azure-blob

Conversation

@MTG-Thomas

@MTG-Thomas MTG-Thomas commented May 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add an Azure Blob-backed object storage client alongside the existing S3 client
  • add runtime provider selection with Blob config via BIFROST_OBJECT_STORAGE_PROVIDER=azure_blob
  • preserve existing object-key/upload/download/signed-url semantics through shared storage contract coverage
  • document the Azure Blob opt-in environment variables in the README

Verification

  • git diff --check
  • ruff check api/tests/contract/test_object_storage_contract.py api/tests/unit/services/test_file_storage_backend_selection.py
  • PYTHONPATH=api BIFROST_SECRET_KEY=0123456789abcdef0123456789abcdef python -m pytest api/tests/unit/services/test_file_storage_backend_selection.py api/tests/contract/test_object_storage_contract.py -q
    • result: 6 passed, 1 skipped

Notes

  • api/tests/unit/routers/test_files_signed_url.py still does not collect in this workstation Python environment because aio_pika is missing.

@MTG-Thomas MTG-Thomas requested a review from jackmusick as a code owner May 24, 2026 23:38
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
Comment thread api/tests/contract/test_object_storage_contract.py Fixed
@MTG-Thomas MTG-Thomas force-pushed the codex/upstream-azure-blob branch from b92e884 to e8454a4 Compare May 26, 2026 02:28
@MTG-Thomas

MTG-Thomas commented May 27, 2026

Copy link
Copy Markdown
Contributor Author

Follow-up pushed after live deployment validation: Blob mode now routes repo storage, startup/reindex listing, and /health/ready through the active Azure Blob backend instead of requiring a configured S3 endpoint. This avoids a hidden S3-compatible storage dependency when BIFROST_OBJECT_STORAGE_PROVIDER=azure_blob is selected.

Additional local verification:

  • python -m ruff check README.md api/src/services/repo_storage.py api/src/services/file_storage/azure_blob_client.py api/src/services/file_storage/s3_client.py api/src/routers/health.py api/src/services/file_storage/reindex.py api/tests/unit/test_repo_storage.py api/tests/unit/routers/test_health.py api/tests/unit/services/test_file_storage_backend_selection.py -> passed
  • python -m pytest api/tests/unit/routers/test_health.py api/tests/unit/test_repo_storage.py api/tests/unit/services/test_file_storage_backend_selection.py -q -> 34 passed

@MTG-Thomas MTG-Thomas force-pushed the codex/upstream-azure-blob branch from 42e69e6 to 740118d Compare May 27, 2026 12:07
@MTG-Thomas MTG-Thomas marked this pull request as draft May 27, 2026 12:45
@MTG-Thomas

Copy link
Copy Markdown
Contributor Author

Having some issues with storage, will revise upstream PR once smoked out and resolved.


if not account_url or not container:
logger.debug("Azure Blob not configured, skipping Blob fallback")
_blob_available = False
if auth == "account_key":
if not account_key:
logger.debug("Azure Blob account key missing, skipping Blob fallback")
_blob_available = False
credential = DefaultAzureCredential()
else:
logger.warning(f"Unsupported Azure Blob auth mode: {auth}")
_blob_available = False

service_client = BlobServiceClient(account_url, credential=credential)
_blob_container_client = service_client.get_container_client(container)
_blob_available = True
_blob_available = True
return _blob_container_client
except Exception as e:
_blob_available = False

def test_blob_not_found_logs_debug_not_warning(self, caplog):
"""BlobNotFound should be a cache miss, not a noisy warning."""
import src.core.module_cache_sync as mod
@MTG-Thomas MTG-Thomas closed this May 28, 2026
@MTG-Thomas MTG-Thomas deleted the codex/upstream-azure-blob branch May 28, 2026 17:34
@jackmusick

Copy link
Copy Markdown
Collaborator

Did you give up?

@MTG-Thomas MTG-Thomas restored the codex/upstream-azure-blob branch May 28, 2026 17:43
@MTG-Thomas

Copy link
Copy Markdown
Contributor Author

No, agent misfire.

@MTG-Thomas MTG-Thomas reopened this May 28, 2026
@MTG-Thomas

Copy link
Copy Markdown
Contributor Author

@jackmusick - I had something like 60 worktrees on my machine that I'm cleaning up today--composer 2.5 took, "clean up anything stale from our fork" a bit too literally once or twice, including this branch.

@jackmusick

jackmusick commented May 28, 2026 via email

Copy link
Copy Markdown
Collaborator

@MTG-Thomas MTG-Thomas closed this May 28, 2026
@MTG-Thomas MTG-Thomas deleted the codex/upstream-azure-blob branch May 28, 2026 17:47
@MTG-Thomas MTG-Thomas restored the codex/upstream-azure-blob branch May 28, 2026 17:48
@MTG-Thomas MTG-Thomas reopened this May 28, 2026
@MTG-Thomas MTG-Thomas marked this pull request as ready for review May 28, 2026 17:49
@MTG-Thomas MTG-Thomas closed this May 28, 2026
@MTG-Thomas MTG-Thomas deleted the codex/upstream-azure-blob branch May 28, 2026 18:01
@MTG-Thomas MTG-Thomas restored the codex/upstream-azure-blob branch May 28, 2026 18:15
@MTG-Thomas MTG-Thomas reopened this May 28, 2026
@MTG-Thomas

Copy link
Copy Markdown
Contributor Author

Cursor mega-struggling with my vague language today.

@MTG-Thomas

Copy link
Copy Markdown
Contributor Author

This should be now legitimately ready for review @jackmusick

MTG-Thomas and others added 4 commits May 29, 2026 15:29
Resolve files router conflict by adopting FILE_LOCATION_DESCRIPTION
from main for request model field docs.

Co-authored-by: Cursor <cursoragent@cursor.com>
)

assert config == {
"api_key": "decrypted-api-key",
Worker subprocesses already lower-case BIFROST_OBJECT_STORAGE_PROVIDER
when choosing Azure vs S3, but the API compared the raw env value. Mixed-case
values like AZURE_BLOB could leave the API on S3 while workers used Azure.

@jackmusick jackmusick left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good foundation here and I want to land it, but a few things are blocking and one of them bugs me.

First, the thing I had to go check: this CI run is green but it isn't testing Azure, or S3. The contract suite in test_object_storage_contract.py only adds the S3 and Azure backends to the param list when BIFROST_STORAGE_CONTRACT_* env vars are set (the _storage_backends() helper drops them with return None otherwise). CI doesn't set them, so the whole suite runs against the in-memory fake. So "contract tests pass" currently means "a dict behaves like a dict." We need either Azurite wired into CI, or a real Azure container the suite runs against — otherwise the Azure path has no actual coverage.

Second, even with real-backend tests, two subsystems would slip through because they never go through the abstraction:

  • app_storage.py is untouched — the whole _apps/ preview/publish path calls aiobotocore directly (create_client/copy_object/put_object/get_object/list_objects_v2). In Azure mode this still goes to S3.
  • app_bundler/__init__.py::_write_live() reaches into app_storage._get_client()/_bucket and does put_object itself — same problem for live bundles, and it's poking AppStorageService's privates.

Both need to route through the provider-selected client, and the publish path needs a test under the real backend (the contract suite structurally can't catch these since they bypass it).

Last, a scope question: git_repo_manager.py shells out to aws s3 sync — no Azure equivalent. Fine if repo-manager is out of scope for Azure, but let's say so in a comment rather than leave it as a landmine.

Needs a rebase too (conflicts against main).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants