Releases: brightdata/cli
v0.3.0
v0.3.0
Scraper Studio gets a major workflow upgrade: multi-URL runs, a stable
machine-readable output envelope, automatic backoff on the concurrent-job
cap, real CSV/HTML/Markdown file output, clearer error hints, and an
Examples: section in every command's --help.
All changes are backward compatible — existing bdata scraper run <id> <url>
and scraper create invocations behave exactly as before.
Features
scraper run — multi-URL input (#8)
Run a scraper against many URLs in a single API call. The CLI was the only
client without this; it now mirrors the reference SDKs' batch path.
--urls "u1,u2,u3"— comma-separated list.--input-file <path>— one URL per line (#comments and blanks skipped),
or a JSON array of strings, or a JSON array of{"url": "..."}objects.
A list of URLs becomes one POST /dca/trigger, one snapshot, one merged
result array. The positional <url> is now optional; exactly one input
source may be given. --sync is rejected for multi-URL (the sync endpoint is
single-URL only).
bdata scraper run c_xxx --urls "https://a.test/1,https://a.test/2" -o out.json
bdata scraper run c_xxx --input-file urls.txt -o out.jsonscraper create — stable output envelope (#6)
-o / --json now writes a consistent envelope on every termination path
(success and failure), so the documented jq -r '.collector_id' recipe always
finds an id — even when a build half-fails:
{
"collector_id": "c_...",
"name": "...",
"status": "done | failed | ai_trigger_failed | poll_failed",
"completed_steps": ["..."],
"view_url": "https://brightdata.com/cp/scrapers/c_...",
"created_at": "...",
"error": "..."
}Failure paths that previously wrote nothing now write the envelope (with a
recoverable collector_id). --legacy-output keeps the old bare-progress
shape for one minor version while you migrate.
scraper create — auto-backoff on the AI-Flow 429 cap (#7)
AI generation is capped at a few concurrent jobs per account. Hitting the cap
used to fail within seconds, leaving half-built collectors behind. The CLI now
waits with exponential backoff + jitter and retries, printing status so you
know it isn't hung:
Hit AI-Flow concurrent-job cap (429). Waiting 32s before retry 1/4...
--max-retries <n>— override the retry count (default4).--no-retry— fail fast on 429 instead of waiting.
On terminal failure, a recovery note points at the half-built collector's
dashboard URL.
Output — real CSV / HTML / Markdown file formats (#9)
-o file.csv now writes actual CSV, file.md writes a Markdown table, and
file.html writes an HTML table — chosen by file extension.
bdata scraper run c_xxx https://a.test -o products.csv # RFC-4180 CSV.xlsx / .xls are rejected up front with a helpful message instead of
silently producing a broken file.
--help — Examples in every command (#10)
scraper create, scraper run, discover, search, pipelines, and
scrape each gain an Examples: section in --help, using real public
domains. No more alt-tabbing to the README to find a working invocation.
bdata scraper create --help # ends with copy-paste-ready examplesFixes
Accurate error hints for Scraper Studio (#5)
A 403 from a stub collector used to map to a misleading "check your zone
permissions" hint, sending users down the wrong path. Scraper-API errors now
get accurate, actionable hints (e.g. "AI generation has not completed — re-run
scraper create", "you hit the AI-Flow concurrent-job cap"). Other commands
(scrape, search, discover, pipelines, browser) are unchanged — the
scraper vocabulary stays scoped to the scraper command.
Bug fix
--no-retry on scraper create was silently ignored due to a Commander flag
mapping (opts.retry vs opts.noRetry); it now correctly disables retries.
Upgrade notes
- No action required — fully backward compatible.
- If you parse the
scraper create -ofile, note it is now an envelope, not
the bare AI-progress payload. Use--legacy-outputto keep the old shape
during migration (removed next minor).
V0.2.0
v0.2.0 - Scraper Studio AI
Build and run Bright Data scrapers from the CLI. Two new commands under a new scraper group.
brightdata scraper create <url> <description>
Build a custom scraper from a natural-language description. Wraps the Scraper Studio AI Flow: creates the scraper template, triggers AI generation, polls until done.
brightdata scraper create https://example.com/product/1 \
"Extract title, price, and image URL from this product page"Returns a collector_id you reuse with scraper run. Defaults to a placeholder webhook delivery target you reconfigure in the Bright Data web UI; override with --deliver-webhook.
Flags: --name, --deliver-webhook, --timeout, -o, --json, --pretty, --timing, -k.
brightdata scraper run <collector_id> <url>
Run a scraper against a URL and get the data back. Three execution paths, picked automatically:
- Async + poll (default) — triggers
/dca/trigger_immediate, polls/dca/get_resultuntil ready. Right for most jobs. - Sync (
--sync) — one-shot/dca/crawlwith a 25–50s server-side cap. Right for fast pages where you want to skip polling entirely. On server-side timeout, prints theresponse_idso you can re-run without--syncto recover. - Auto-fallback to batch — if the realtime endpoint reports the page limit was exceeded (paginated listings, infinite scroll, etc.), the CLI switches to the batch endpoint (
/dca/trigger→ poll/dca/dataset) with a longer poll interval and a 1-hour default timeout. No flag required.
# Default async + poll
brightdata scraper run c_mp3tuab31lswoxvpws https://www.amazon.com/dp/B08N5WRWNW --pretty
# Sync for fast pages
brightdata scraper run c_mp3tuab31lswoxvpws https://example.com/p/1 --sync
# Large/paginated URL — falls back to batch automatically
brightdata scraper run c_mp3tuab31lswoxvpws \
"https://www.ycombinator.com/companies?batch=Spring%202026" --prettyFlags: --sync, --sync-timeout, --timeout, --name, --version, -o, --json, --pretty, --timing, -k.
Implementation notes
- Errors surface the relevant
collector_id/response_id/collection_idso partial state is recoverable from the web UI. - TTY output is human-readable; piped /
--json/--outputalways emit JSON. - 45 new unit tests covering pure helpers, both run modes, the page-limit detector, and the fallback wiring.
Not included (planned)
- Self-healing (
scraper refactor) — reuses the samescrapercommand group. - Resuming a job by
response_id/collection_idafter Ctrl+C.
Full diff: v0.1.8...v0.2.0
v0.1.7
🔥 New: brightdata discover command
AI-powered web discovery - find, rank, and extract web content directly from your terminal.
Highlights:
- Search the web with natural language queries
- Rank results by AI intent (e.g. "Prioritize institutional reports for VC research")
- Extract full page content as markdown with
--include-content - Geo-target by country and city
- Filter by keywords and date ranges
- Pipe-friendly: redirected stdout automatically outputs JSON
Usage:
# Basic discovery
brightdata discover "AI trends"
# With intent and content extraction
brightdata discover "AI trends" \
--intent "Find research papers" \
--include-content --num-results 5
# Pipe to file
brightdata discover "AI trends" --include-content > results.jsonv0.1.5
Bright Data Browser API
This release introduces brightdata browser - a full browser automation command group powered by Bright Data's Browser API. Control a real Chromium browser from your terminal or AI agent, with persistent sessions, geo-targeting, and a token-efficient accessibility tree snapshot system.
What's New
Browser Sessions
Start a browser session with a single command. A lightweight local daemon manages the connection so subsequent commands are instant — no reconnecting on every call.
brightdata browser open https://example.com
brightdata browser open https://amazon.com --country us --session shopSessions are named, isolated, and auto-shutdown after 10 minutes of inactivity (configurable). Run as many sessions in parallel as you need.
# Two sessions, two countries, running simultaneously
brightdata browser open https://amazon.com --session us --country us
brightdata browser open https://amazon.com --session de --country deAccessibility Tree Snapshots
The primary way to read a page — far more token-efficient than raw HTML. Each interactive element is assigned a short ref (e1, e2, …) that you use to interact with it.
brightdata browser snapshot # Full tree
brightdata browser snapshot --compact # Interactive elements + ancestors only
brightdata browser snapshot --depth 3 # Limit depth
brightdata browser snapshot --selector "main" # Scope to a CSS subtreeExample output:
Page: Sign In — Example
URL: https://example.com/login
- heading "Sign In" [level=1]
- form
- textbox "Email" [ref=e1, placeholder="you@example.com"]
- textbox "Password" [ref=e2, placeholder="Password"]
- button "Sign In" [ref=e3]
- link "Forgot password?" [ref=e4]
Element Interaction
Interact with elements using the ref values from your snapshot:
brightdata browser type e1 "user@example.com"
brightdata browser type e2 "password" --submit # types and hits Enter
brightdata browser click e3
brightdata browser fill e1 "user@example.com" # direct fill, no key events
brightdata browser select e5 "United States" # dropdown by label
brightdata browser check e6 # checkbox / radio
brightdata browser hover e2 # trigger hover states
brightdata browser scroll --direction down --distance 600
brightdata browser scroll --ref e10 # scroll element into viewScreenshots
brightdata browser screenshot ./result.png
brightdata browser screenshot --full-page
brightdata browser screenshot --base64 # inline base64 for AI agentsContent Extraction
brightdata browser get text # full page text
brightdata browser get text "h1" # scoped to CSS selector
brightdata browser get html ".product" # innerHTML of an elementNetwork & Cookies
brightdata browser network # requests captured since last navigation
brightdata browser cookies # current session cookiesNetwork output:
Network Requests (5 total):
[GET] https://example.com/ => [200]
[GET] https://example.com/style.css => [200]
[POST] https://api.example.com/track => [204]
Session Management
brightdata browser sessions # list all active sessions
brightdata browser status # current session state
brightdata browser close # close default session
brightdata browser close --session shop # close named session
brightdata browser close --all # close everythingCountry Switching
Changing --country on an existing session automatically closes the browser, fetches a new geo-targeted endpoint, and reconnects — no manual close required.
brightdata browser open https://example.com --country us
# later...
brightdata browser open https://example.com --country de # reconnects automaticallyAI-Safe Content Boundaries
Wrap snapshot output in nonce-delimited boundaries to protect against prompt injection when feeding page content into an AI agent:
brightdata browser snapshot --wrap--- BRIGHTDATA_BROWSER_CONTENT nonce=a3f8c2... origin=https://example.com ---
Page: Example
...
--- END_BRIGHTDATA_BROWSER_CONTENT nonce=a3f8c2... ---
New Flags
| Flag | Applies to | Description |
|---|---|---|
--session <name> |
all subcommands | Named session (default: default) |
--country <code> |
open |
ISO geo-targeting, auto-reconnects on change |
--zone <name> |
open |
Scraping Browser zone (default: cli_browser) |
--idle-timeout <ms> |
open |
Daemon auto-shutdown after idle (default: 10 min) |
--timeout <ms> |
all subcommands | IPC command timeout (default: 30s) |
--compact |
snapshot |
Interactive elements + ancestors only |
--interactive |
snapshot |
Interactive elements as a flat list |
--depth <n> |
snapshot |
Limit tree depth |
--selector <sel> |
snapshot |
Scope to CSS subtree |
--wrap |
snapshot |
AI-safe content boundary wrapping |
--full-page |
screenshot |
Full scrollable page capture |
--base64 |
screenshot |
Output base64-encoded PNG |
--append |
type |
Append to existing value |
--submit |
type |
Press Enter after typing |
--direction <dir> |
scroll |
up / down / left / right |
--distance <px> |
scroll |
Pixels to scroll |
--ref <ref> |
scroll |
Scroll element into view |
--all |
close |
Close all active sessions |
New Environment Variables
| Variable | Description |
|---|---|
BRIGHTDATA_BROWSER_ZONE |
Default Scraping Browser zone |
BRIGHTDATA_DAEMON_DIR |
Override daemon socket / PID file directory |
Full AI Agent Workflow Example
# Open a US-targeted session
brightdata browser open https://news.ycombinator.com --country us
# Read the page (compact = minimal tokens)
brightdata browser snapshot --compact
# Click the first story link (ref from snapshot)
brightdata browser click e1
# Read the new page
brightdata browser snapshot --compact
# Screenshot for visual verification
brightdata browser screenshot ./hn-story.png
# Done
brightdata browser closeTechnical Notes
- Uses
playwright-core(no browser download — connects to Bright Data's remote Chromium via CDP) - Daemon communicates over Unix sockets on Linux/macOS, TCP on Windows
- Daemon PID and socket files stored in
~/.brightdata-cli/(Linux),~/Library/Application Support/brightdata-cli/(macOS),%APPDATA%\brightdata-cli\(Windows) - The
cli_browserzone is created automatically on first use if it doesn't exist
Upgrade
npm install -g @brightdata/cli@latestv0.1.1
Release Notes - @brightdata/cli v0.1.1
First official release of the Bright Data CLI.
What is it?
A command-line interface for the Bright Data platform. Scrape websites, search the web, and extract structured data from 40+ platforms — directly from your terminal.
npm i -g @brightdata/cliTwo commands are available after install: brightdata and bdata (shorthand).
Commands
| Command | Description |
|---|---|
bdata scrape <url> |
Scrape any URL — bypasses CAPTCHAs, JS rendering, and anti-bot protections |
bdata search <query> |
Search Google, Bing, or Yandex with structured JSON output |
bdata pipelines <type> |
Extract structured data from 40+ platforms (Amazon, LinkedIn, TikTok, and more) |
bdata zones |
List and inspect your Bright Data proxy zones |
bdata budget |
View account balance and per-zone cost and bandwidth |
bdata status <id> |
Check the status of an async job |
bdata config |
Get or set CLI configuration |
bdata init |
Interactive setup wizard |
bdata login |
Authenticate with Bright Data |
bdata logout |
Clear stored credentials |
bdata skill |
Browse and install AI agent skills |
bdata version |
Display version and environment info |
Highlights
Scraping
- Automatic CAPTCHA solving, JavaScript rendering, and anti-bot bypass
- Output as markdown, HTML, JSON, or screenshot
- Geo-targeting by country
- Async job submission with polling
Web Search (SERP)
- Google (structured results with organic, ads, knowledge graph, people-also-ask), Bing, and Yandex
- Search types: web, news, images, shopping
- Device targeting (desktop/mobile), country, and language localization
- Pagination support
Structured Data Extraction (Pipelines)
42 dataset types across major platforms:
- E-commerce — Amazon, Walmart, eBay, Best Buy, Etsy, Home Depot, Zara, Google Shopping
- Professional — LinkedIn (profiles, companies, jobs, posts, people search), Crunchbase, ZoomInfo
- Social — Instagram, Facebook, TikTok, X (Twitter), YouTube, Reddit
- Other — Google Maps, Google Play, Apple App Store, Reuters, GitHub, Yahoo Finance, Zillow, Booking.com
Authentication
- Browser-based OAuth for desktop environments
- Device flow for headless/SSH sessions
- Direct API key via flag or
BRIGHTDATA_API_KEYenv variable - Automatic zone provisioning on first login
AI Agent Skills
Install Bright Data capabilities into 45+ coding agents (Claude Code, Cursor, Windsurf, GitHub Copilot, and more):
- search — structured Google search results
- scrape — webpage scraping as clean markdown
- data-feeds — structured data extraction from 40+ platforms
- bright-data-mcp — 60+ MCP tools for search, scraping, and browser automation
- bright-data-best-practices — reference knowledge base for writing Bright Data code
Output
- Human-readable tables and markdown in TTY
- JSON, pretty JSON, CSV, NDJSON for programmatic use
- File output with auto-format detection (
-o results.json) - Pipe-friendly — colors and spinners disabled automatically in non-TTY
Configuration
- XDG-compliant config storage
- CLI flags > environment variables > config file > defaults
- Configurable default zones, output format, and API URL
Requirements
- Node.js >= 18
- A Bright Data account
Install
# npm
npm i -g @brightdata/cli
# or one-liner
curl -fsSL https://raw.githubusercontent.com/brightdata/cli/main/install.sh | shLinks
- npm: https://www.npmjs.com/package/@brightdata/cli
- GitHub: https://github.com/brightdata/cli
- Bright Data: https://brightdata.com