27 May 09:39

a671610

v0.3.0 Latest

Latest

v0.3.0

Scraper Studio gets a major workflow upgrade: multi-URL runs, a stable
machine-readable output envelope, automatic backoff on the concurrent-job
cap, real CSV/HTML/Markdown file output, clearer error hints, and an
Examples: section in every command's --help.

All changes are backward compatible — existing bdata scraper run <id> <url>
and scraper create invocations behave exactly as before.

Features

`scraper run` — multi-URL input (#8)

Run a scraper against many URLs in a single API call. The CLI was the only
client without this; it now mirrors the reference SDKs' batch path.

--urls "u1,u2,u3" — comma-separated list.
--input-file <path> — one URL per line (# comments and blanks skipped),
or a JSON array of strings, or a JSON array of {"url": "..."} objects.

A list of URLs becomes one POST /dca/trigger, one snapshot, one merged
result array. The positional <url> is now optional; exactly one input
source may be given. --sync is rejected for multi-URL (the sync endpoint is
single-URL only).

bdata scraper run c_xxx --urls "https://a.test/1,https://a.test/2" -o out.json
bdata scraper run c_xxx --input-file urls.txt -o out.json

`scraper create` — stable output envelope (#6)

-o / --json now writes a consistent envelope on every termination path
(success and failure), so the documented jq -r '.collector_id' recipe always
finds an id — even when a build half-fails:

{
  "collector_id": "c_...",
  "name": "...",
  "status": "done | failed | ai_trigger_failed | poll_failed",
  "completed_steps": ["..."],
  "view_url": "https://brightdata.com/cp/scrapers/c_...",
  "created_at": "...",
  "error": "..."
}

Failure paths that previously wrote nothing now write the envelope (with a
recoverable collector_id). --legacy-output keeps the old bare-progress
shape for one minor version while you migrate.

`scraper create` — auto-backoff on the AI-Flow 429 cap (#7)

AI generation is capped at a few concurrent jobs per account. Hitting the cap
used to fail within seconds, leaving half-built collectors behind. The CLI now
waits with exponential backoff + jitter and retries, printing status so you
know it isn't hung:

Hit AI-Flow concurrent-job cap (429). Waiting 32s before retry 1/4...

--max-retries <n> — override the retry count (default 4).
--no-retry — fail fast on 429 instead of waiting.

On terminal failure, a recovery note points at the half-built collector's
dashboard URL.

Output — real CSV / HTML / Markdown file formats (#9)

-o file.csv now writes actual CSV, file.md writes a Markdown table, and
file.html writes an HTML table — chosen by file extension.

bdata scraper run c_xxx https://a.test -o products.csv   # RFC-4180 CSV

.xlsx / .xls are rejected up front with a helpful message instead of
silently producing a broken file.

`--help` — Examples in every command (#10)

scraper create, scraper run, discover, search, pipelines, and
scrape each gain an Examples: section in --help, using real public
domains. No more alt-tabbing to the README to find a working invocation.

bdata scraper create --help   # ends with copy-paste-ready examples

Fixes

Accurate error hints for Scraper Studio (#5)

A 403 from a stub collector used to map to a misleading "check your zone
permissions" hint, sending users down the wrong path. Scraper-API errors now
get accurate, actionable hints (e.g. "AI generation has not completed — re-run
scraper create", "you hit the AI-Flow concurrent-job cap"). Other commands
(scrape, search, discover, pipelines, browser) are unchanged — the
scraper vocabulary stays scoped to the scraper command.

Bug fix

--no-retry on scraper create was silently ignored due to a Commander flag
mapping (opts.retry vs opts.noRetry); it now correctly disables retries.

Upgrade notes

No action required — fully backward compatible.
If you parse the scraper create -o file, note it is now an envelope, not
the bare AI-progress payload. Use --legacy-output to keep the old shape
during migration (removed next minor).

Full changelog: v0.2.0...v0.3.0 (PRs #5–#10)

Assets 2

13 May 14:20

meirk-brd

v0.2.0

fadfc2b

V0.2.0

v0.2.0 - Scraper Studio AI

Build and run Bright Data scrapers from the CLI. Two new commands under a new scraper group.

`brightdata scraper create <url> <description>`

Build a custom scraper from a natural-language description. Wraps the Scraper Studio AI Flow: creates the scraper template, triggers AI generation, polls until done.

brightdata scraper create https://example.com/product/1 \
  "Extract title, price, and image URL from this product page"

Returns a collector_id you reuse with scraper run. Defaults to a placeholder webhook delivery target you reconfigure in the Bright Data web UI; override with --deliver-webhook.

Flags: --name, --deliver-webhook, --timeout, -o, --json, --pretty, --timing, -k.

`brightdata scraper run <collector_id> <url>`

Run a scraper against a URL and get the data back. Three execution paths, picked automatically:

Async + poll (default) — triggers /dca/trigger_immediate, polls /dca/get_result until ready. Right for most jobs.
Sync (--sync) — one-shot /dca/crawl with a 25–50s server-side cap. Right for fast pages where you want to skip polling entirely. On server-side timeout, prints the response_id so you can re-run without --sync to recover.
Auto-fallback to batch — if the realtime endpoint reports the page limit was exceeded (paginated listings, infinite scroll, etc.), the CLI switches to the batch endpoint (/dca/trigger → poll /dca/dataset) with a longer poll interval and a 1-hour default timeout. No flag required.

# Default async + poll
brightdata scraper run c_mp3tuab31lswoxvpws https://www.amazon.com/dp/B08N5WRWNW --pretty

# Sync for fast pages
brightdata scraper run c_mp3tuab31lswoxvpws https://example.com/p/1 --sync

# Large/paginated URL — falls back to batch automatically
brightdata scraper run c_mp3tuab31lswoxvpws \
  "https://www.ycombinator.com/companies?batch=Spring%202026" --pretty

Flags: --sync, --sync-timeout, --timeout, --name, --version, -o, --json, --pretty, --timing, -k.

Implementation notes

Errors surface the relevant collector_id / response_id / collection_id so partial state is recoverable from the web UI.
TTY output is human-readable; piped / --json / --output always emit JSON.
45 new unit tests covering pure helpers, both run modes, the page-limit detector, and the fallback wiring.

Not included (planned)

Self-healing (scraper refactor) — reuses the same scraper command group.
Resuming a job by response_id / collection_id after Ctrl+C.

Full diff: v0.1.8...v0.2.0

Assets 2

05 Apr 08:30

meirk-brd

v0.1.7

2ab3bbd

v0.1.7

🔥 New: `brightdata discover` command

AI-powered web discovery - find, rank, and extract web content directly from your terminal.

Highlights:

Search the web with natural language queries
Rank results by AI intent (e.g. "Prioritize institutional reports for VC research")
Extract full page content as markdown with --include-content
Geo-target by country and city
Filter by keywords and date ranges
Pipe-friendly: redirected stdout automatically outputs JSON

Usage:

# Basic discovery
brightdata discover "AI trends"

# With intent and content extraction
brightdata discover "AI trends" \
  --intent "Find research papers" \
  --include-content --num-results 5

# Pipe to file
brightdata discover "AI trends" --include-content > results.json

Assets 2

26 Mar 16:12

meirk-brd

v0.1.5

5cad863

v0.1.5

Bright Data Browser API

This release introduces brightdata browser - a full browser automation command group powered by Bright Data's Browser API. Control a real Chromium browser from your terminal or AI agent, with persistent sessions, geo-targeting, and a token-efficient accessibility tree snapshot system.

What's New

Browser Sessions

Start a browser session with a single command. A lightweight local daemon manages the connection so subsequent commands are instant — no reconnecting on every call.

brightdata browser open https://example.com
brightdata browser open https://amazon.com --country us --session shop

Sessions are named, isolated, and auto-shutdown after 10 minutes of inactivity (configurable). Run as many sessions in parallel as you need.

# Two sessions, two countries, running simultaneously
brightdata browser open https://amazon.com --session us --country us
brightdata browser open https://amazon.com --session de --country de

Accessibility Tree Snapshots

The primary way to read a page — far more token-efficient than raw HTML. Each interactive element is assigned a short ref (e1, e2, …) that you use to interact with it.

brightdata browser snapshot              # Full tree
brightdata browser snapshot --compact    # Interactive elements + ancestors only
brightdata browser snapshot --depth 3    # Limit depth
brightdata browser snapshot --selector "main"  # Scope to a CSS subtree

Example output:

Page: Sign In — Example
URL: https://example.com/login

- heading "Sign In" [level=1]
- form
  - textbox "Email" [ref=e1, placeholder="you@example.com"]
  - textbox "Password" [ref=e2, placeholder="Password"]
  - button "Sign In" [ref=e3]
  - link "Forgot password?" [ref=e4]

Element Interaction

Interact with elements using the ref values from your snapshot:

brightdata browser type e1 "user@example.com"
brightdata browser type e2 "password" --submit   # types and hits Enter
brightdata browser click e3
brightdata browser fill e1 "user@example.com"    # direct fill, no key events
brightdata browser select e5 "United States"     # dropdown by label
brightdata browser check e6                      # checkbox / radio
brightdata browser hover e2                      # trigger hover states
brightdata browser scroll --direction down --distance 600
brightdata browser scroll --ref e10              # scroll element into view

Screenshots

brightdata browser screenshot ./result.png
brightdata browser screenshot --full-page
brightdata browser screenshot --base64          # inline base64 for AI agents

Content Extraction

brightdata browser get text             # full page text
brightdata browser get text "h1"        # scoped to CSS selector
brightdata browser get html ".product"  # innerHTML of an element

Network & Cookies

brightdata browser network    # requests captured since last navigation
brightdata browser cookies    # current session cookies

Network output:

Network Requests (5 total):
[GET] https://example.com/ => [200]
[GET] https://example.com/style.css => [200]
[POST] https://api.example.com/track => [204]

Session Management

brightdata browser sessions              # list all active sessions
brightdata browser status                # current session state
brightdata browser close                 # close default session
brightdata browser close --session shop  # close named session
brightdata browser close --all           # close everything

Country Switching

Changing --country on an existing session automatically closes the browser, fetches a new geo-targeted endpoint, and reconnects — no manual close required.

brightdata browser open https://example.com --country us
# later...
brightdata browser open https://example.com --country de  # reconnects automatically

AI-Safe Content Boundaries

Wrap snapshot output in nonce-delimited boundaries to protect against prompt injection when feeding page content into an AI agent:

brightdata browser snapshot --wrap

--- BRIGHTDATA_BROWSER_CONTENT nonce=a3f8c2... origin=https://example.com ---
Page: Example
...
--- END_BRIGHTDATA_BROWSER_CONTENT nonce=a3f8c2... ---

New Flags

Flag	Applies to	Description
`--session <name>`	all subcommands	Named session (default: `default`)
`--country <code>`	`open`	ISO geo-targeting, auto-reconnects on change
`--zone <name>`	`open`	Scraping Browser zone (default: `cli_browser`)
`--idle-timeout <ms>`	`open`	Daemon auto-shutdown after idle (default: 10 min)
`--timeout <ms>`	all subcommands	IPC command timeout (default: 30s)
`--compact`	`snapshot`	Interactive elements + ancestors only
`--interactive`	`snapshot`	Interactive elements as a flat list
`--depth <n>`	`snapshot`	Limit tree depth
`--selector <sel>`	`snapshot`	Scope to CSS subtree
`--wrap`	`snapshot`	AI-safe content boundary wrapping
`--full-page`	`screenshot`	Full scrollable page capture
`--base64`	`screenshot`	Output base64-encoded PNG
`--append`	`type`	Append to existing value
`--submit`	`type`	Press Enter after typing
`--direction <dir>`	`scroll`	`up` / `down` / `left` / `right`
`--distance <px>`	`scroll`	Pixels to scroll
`--ref <ref>`	`scroll`	Scroll element into view
`--all`	`close`	Close all active sessions

New Environment Variables

Variable	Description
`BRIGHTDATA_BROWSER_ZONE`	Default Scraping Browser zone
`BRIGHTDATA_DAEMON_DIR`	Override daemon socket / PID file directory

Full AI Agent Workflow Example

# Open a US-targeted session
brightdata browser open https://news.ycombinator.com --country us

# Read the page (compact = minimal tokens)
brightdata browser snapshot --compact

# Click the first story link (ref from snapshot)
brightdata browser click e1

# Read the new page
brightdata browser snapshot --compact

# Screenshot for visual verification
brightdata browser screenshot ./hn-story.png

# Done
brightdata browser close

Technical Notes

Uses playwright-core (no browser download — connects to Bright Data's remote Chromium via CDP)
Daemon communicates over Unix sockets on Linux/macOS, TCP on Windows
Daemon PID and socket files stored in ~/.brightdata-cli/ (Linux), ~/Library/Application Support/brightdata-cli/ (macOS), %APPDATA%\brightdata-cli\ (Windows)
The cli_browser zone is created automatically on first use if it doesn't exist

Upgrade

npm install -g @brightdata/cli@latest

Assets 2

18 Mar 14:01

meirk-brd

v0.1.1

5d0f3a0

v0.1.1

Release Notes - @brightdata/cli v0.1.1

First official release of the Bright Data CLI.

What is it?

A command-line interface for the Bright Data platform. Scrape websites, search the web, and extract structured data from 40+ platforms — directly from your terminal.

npm i -g @brightdata/cli

Two commands are available after install: brightdata and bdata (shorthand).

Commands

Command	Description
`bdata scrape <url>`	Scrape any URL — bypasses CAPTCHAs, JS rendering, and anti-bot protections
`bdata search <query>`	Search Google, Bing, or Yandex with structured JSON output
`bdata pipelines <type>`	Extract structured data from 40+ platforms (Amazon, LinkedIn, TikTok, and more)
`bdata zones`	List and inspect your Bright Data proxy zones
`bdata budget`	View account balance and per-zone cost and bandwidth
`bdata status <id>`	Check the status of an async job
`bdata config`	Get or set CLI configuration
`bdata init`	Interactive setup wizard
`bdata login`	Authenticate with Bright Data
`bdata logout`	Clear stored credentials
`bdata skill`	Browse and install AI agent skills
`bdata version`	Display version and environment info

Highlights

Scraping

Automatic CAPTCHA solving, JavaScript rendering, and anti-bot bypass
Output as markdown, HTML, JSON, or screenshot
Geo-targeting by country
Async job submission with polling

Web Search (SERP)

Google (structured results with organic, ads, knowledge graph, people-also-ask), Bing, and Yandex
Search types: web, news, images, shopping
Device targeting (desktop/mobile), country, and language localization
Pagination support

Structured Data Extraction (Pipelines)

42 dataset types across major platforms:

E-commerce — Amazon, Walmart, eBay, Best Buy, Etsy, Home Depot, Zara, Google Shopping
Professional — LinkedIn (profiles, companies, jobs, posts, people search), Crunchbase, ZoomInfo
Social — Instagram, Facebook, TikTok, X (Twitter), YouTube, Reddit
Other — Google Maps, Google Play, Apple App Store, Reuters, GitHub, Yahoo Finance, Zillow, Booking.com

Authentication

Browser-based OAuth for desktop environments
Device flow for headless/SSH sessions
Direct API key via flag or BRIGHTDATA_API_KEY env variable
Automatic zone provisioning on first login

AI Agent Skills

Install Bright Data capabilities into 45+ coding agents (Claude Code, Cursor, Windsurf, GitHub Copilot, and more):

search — structured Google search results
scrape — webpage scraping as clean markdown
data-feeds — structured data extraction from 40+ platforms
bright-data-mcp — 60+ MCP tools for search, scraping, and browser automation
bright-data-best-practices — reference knowledge base for writing Bright Data code

Output

Human-readable tables and markdown in TTY
JSON, pretty JSON, CSV, NDJSON for programmatic use
File output with auto-format detection (-o results.json)
Pipe-friendly — colors and spinners disabled automatically in non-TTY

Configuration

XDG-compliant config storage
CLI flags > environment variables > config file > defaults
Configurable default zones, output format, and API URL

Requirements

Node.js >= 18
A Bright Data account

Install

# npm
npm i -g @brightdata/cli

# or one-liner
curl -fsSL https://raw.githubusercontent.com/brightdata/cli/main/install.sh | sh

Releases: brightdata/cli

v0.3.0

v0.3.0

Features

scraper run — multi-URL input (#8)

scraper create — stable output envelope (#6)

scraper create — auto-backoff on the AI-Flow 429 cap (#7)

Output — real CSV / HTML / Markdown file formats (#9)

--help — Examples in every command (#10)

Fixes

Accurate error hints for Scraper Studio (#5)

Bug fix

Upgrade notes

Uh oh!

V0.2.0

v0.2.0 - Scraper Studio AI

brightdata scraper create <url> <description>

brightdata scraper run <collector_id> <url>

Implementation notes

Not included (planned)

Uh oh!

v0.1.7

🔥 New: brightdata discover command

Uh oh!

v0.1.5

Bright Data Browser API

What's New

Browser Sessions

Accessibility Tree Snapshots

Element Interaction

Screenshots

Content Extraction

Network & Cookies

Session Management

Country Switching

AI-Safe Content Boundaries

New Flags

New Environment Variables

Full AI Agent Workflow Example

Technical Notes

Upgrade

Uh oh!

v0.1.1

Release Notes - @brightdata/cli v0.1.1

What is it?

Commands

Highlights

Scraping

Web Search (SERP)

Structured Data Extraction (Pipelines)

Authentication

AI Agent Skills

Output

Configuration

Requirements

Install

Links

Uh oh!

`scraper run` — multi-URL input (#8)

`scraper create` — stable output envelope (#6)

`scraper create` — auto-backoff on the AI-Flow 429 cap (#7)

`--help` — Examples in every command (#10)

`brightdata scraper create <url> <description>`

`brightdata scraper run <collector_id> <url>`

🔥 New: `brightdata discover` command