Skip to content

Auto oss vs cost efficient 50/50 A/B test#9355

Merged
IsaiahWitzke merged 4 commits intomasterfrom
iw/oss-ab-test-client
Apr 29, 2026
Merged

Auto oss vs cost efficient 50/50 A/B test#9355
IsaiahWitzke merged 4 commits intomasterfrom
iw/oss-ab-test-client

Conversation

@IsaiahWitzke
Copy link
Copy Markdown
Contributor

@IsaiahWitzke IsaiahWitzke commented Apr 29, 2026

I want to run an a/b test where 50% of free users get defaulted to auto cost efficient, and other 50% to auto open weights.

Tests

  • 50% of the time when i open the app with a different WARP_DATA_DIR i get the expected flip/flopping
  • my default gets persisted throughout signup

IsaiahWitzke and others added 2 commits April 28, 2026 22:12
Adds a new client-side experiment (FreeTierDefaultModel) that buckets
free-tier and logged-out users 50/50 into AutoEfficient (control) and
AutoOpen (experiment) arms. Users in the AutoOpen arm see auto
(open-weights) as the default model in the configure-oz onboarding
picker; control users see the existing auto (cost-efficient) default.

Bucketing happens entirely client-side off the user's anonymous_id
UUID — the same value that's already attached to every Rudder
telemetry event as anonymousId. This means:

- Pre-signup, post-signup, and signed-out users get the same arm
  (anonymous_id is stable across signup, no transfer logic needed).
- Enrollment is captured automatically via the framework's
  ExperimentTriggered telemetry event.
- Pre/post-signup events stitch automatically in the warehouse via
  Rudder identity stitching on anonymousId.

Server-side, the only change required is allowing AutoOpen on the
free tier (separate companion server PR).

Co-Authored-By: Oz <oz-agent@warp.dev>
The configure-oz onboarding picker renders before any Firebase user
exists, so most pre-signup traffic shows as OnboardingAuthState::LoggedOut
rather than FreeUser. Restricting to FreeUser only meant the override
basically never fired during the actual onboarding flow.

Allow both FreeUser and LoggedOut; still exclude PayingUser so we don't
override the paid-tier default (AutoGenius).

Co-Authored-By: Oz <oz-agent@warp.dev>
@oz-for-oss
Copy link
Copy Markdown
Contributor

oz-for-oss Bot commented Apr 29, 2026

@IsaiahWitzke

I'm starting a first review of this pull request.

You can follow along in the session on Warp.

I completed the review and posted feedback on this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

I completed the review and posted feedback on this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds a client-side 50/50 experiment that can default eligible onboarding users from the server-provided auto cost-efficient model to the auto open-weights model when that model is available. It wires the new experiment layer into the existing experiment framework and applies the override when constructing and refreshing onboarding models.

Concerns

  • No blocking correctness or security concerns found in the inlined diff.

Verdict

Found: 0 critical, 0 important, 0 suggestions

Approve

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Copy link
Copy Markdown
Contributor

@oz-for-oss oz-for-oss Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR adds a client-side FreeTierDefaultModel experiment that splits eligible onboarding users 50/50 between the existing server default model and the auto-open onboarding model, then applies that override when onboarding model choices are created or refreshed.

Concerns

  • No blocking correctness or security concerns found in the changed diff lines.

Verdict

Found: 0 critical, 0 important, 0 suggestions

Approve

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@IsaiahWitzke IsaiahWitzke changed the title Auto - oss 50/50 A/B test Auto oss vs cost efficient 50/50 A/B test Apr 29, 2026
IsaiahWitzke and others added 2 commits April 28, 2026 22:48
Triggers a re-evaluation of build_onboarding_models +
apply_free_tier_default_model_override inside the existing
UserWorkspacesEvent::TeamsChanged handler, so when a user upgrades
free → paid mid-onboarding the picker promptly drops the AutoOpen
'Recommended' pill (the override gate flips to PayingUser and the
server's paid-tier default takes over).

Without this, after upgrade the only triggers for re-evaluating the
override were the initial render and LLMPreferencesEvent::UpdatedAvailableLLMs,
so a stale UserWorkspaces.billing_metadata could leave AutoOpen marked
as the recommended default well after the user had upgraded.

Co-Authored-By: Oz <oz-agent@warp.dev>
Restructure apply_free_tier_default_model_override to bail out unless
the server itself is currently recommending auto-efficient (the free-
tier default). For any other recommendation (auto-genius for paid
users, the Codex referral default, etc.) we respect what the server
says.

This makes the server's recommendation the single source of truth for
when the experiment applies, so:

- Post-upgrade, the moment LLMPreferences refreshes from the server
  (auto-genius default), the AutoOpen 'Recommended' pill goes away.
  No dependence on locally-stale UserWorkspaces.billing_metadata.
- Drop the redundant auth-state gate in should_default_to_auto_open;
  the server already encodes the eligibility check.
- Drop the TeamsChanged re-run hook in root_view.rs; the
  LLMPreferencesEvent::UpdatedAvailableLLMs path is sufficient.

Co-Authored-By: Oz <oz-agent@warp.dev>
@IsaiahWitzke IsaiahWitzke merged commit d0f045c into master Apr 29, 2026
24 checks passed
@IsaiahWitzke IsaiahWitzke deleted the iw/oss-ab-test-client branch April 29, 2026 03:33
wolverine2k pushed a commit to wolverine2k/warp that referenced this pull request May 5, 2026
I want to run an a/b test where 50% of free users get defaulted to auto
cost efficient, and other 50% to auto open weights.

## Tests
- 50% of the time when i open the app with a different WARP_DATA_DIR i
get the expected flip/flopping
- my default gets persisted throughout signup

---------

Co-authored-by: Oz <oz-agent@warp.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants