google · copybara-service · Jun 29, 2026 · Jun 29, 2026
diff --git a/README.md b/README.md
@@ -26,6 +26,7 @@ repo to install.
 -   [**AlloyDB Basics**](./skills/cloud/alloydb-basics)
 -   [**BigQuery Basics**](./skills/cloud/bigquery-basics)
 -   [**BigQuery AI & ML**](./skills/cloud/bigquery-ai-ml)
+-   [**BigQuery BigFrames**](./skills/cloud/bigquery-bigframes)
 -   [**Bigtable Basics**](./skills/cloud/bigtable-basics)
 -   [**Cloud Run Basics**](./skills/cloud/cloud-run-basics)
 -   [**Cloud SQL Basics**](./skills/cloud/cloud-sql-basics)

diff --git a/skills/cloud/bigquery-bigframes/SKILL.md b/skills/cloud/bigquery-bigframes/SKILL.md
@@ -0,0 +1,47 @@
+---
+name: bigquery-bigframes
+metadata:
+  category: BigDataAndAnalytics
+description: >-
+  Generates Python code using BigQuery DataFrames (BigFrames), the pandas/scikit-learn-style API over BigQuery. Use when writing BigFrames code or doing pandas-style dataframe/ML work against BigQuery (e.g. in a notebook). Don't use for SQL-first workflows or the google-cloud-bigquery client library — use bigquery-basics.
+---
+
+# BigFrames Development Standards
+
+*   **Avoid `.to_pandas()`**: You MUST NOT use `.to_pandas()` to download the
+    entire dataset into memory as this downloads all data to the client's
+    memory, bypassing BigQuery's distributed computation and risking Out of
+    Memory (OOM) errors. There are some exceptions:
+    *  An error message explicitly requests you to use `to_pandas()`
+    *  You are going to visualize the data, **and** the visualization library does not accept BigFrames Dataframe/Series instances. In this case, reduce the amount of data you are going to download before calling `.to_pandas()`
+*   **Avoid `read_gbq()` for SQL**: Do not write SQL queries and execute them
+    with `read_gbq()` to maintain the Pandas-like DataFrame abstraction and
+    allow lazy executions. Use BigFrames Dataframe/Series methods instead.
+*   **Use BigFrames ML package for Machine Learning Tasks**: Do not use
+    Scikit-learn or other ML libraries with BigFrames dataframes because
+    standard Scikit-learn models require bringing data into local client memory,
+    whereas bigframes.ml delegates training directly to BigQuery's scalable ML
+    engine. Import your tools/classes from `bigframes.ml`.
+*   **Stay in the Cloud**: Perform data cleaning, transformation, and analysis via BigFrames methods to leverage BigQuery's scale.
+*   **Accessors over UDFs/Lambdas**:
+    *   Prefer built-in accessors (e.g., `df.col.str.*`, `df.col.dt.*`) over remote UDFs.
+    *   **Do not use lambdas** with `Series.map()` or `DataFrame.apply()`.
+*   **Schema Verification**: Do not assume schema of intermediate outputs. Check `.dtypes` after loading, and use `display()` with `.head()` or `.peek()`.
+*   **Visualization**: BigFrames Dataframe mostly works directly with
+    Matplotlib, Seaborn, and other plotting libraries. If your attempt didn't
+    work, try using the `plot` accessor. If that didn't work either, you MUST
+    sample or aggregate your data to make it small enough before calling
+    `to_pandas()`.
+
+# Model Development
+
+* **Unlike Scikit-learn**: BigFrames' `predict()` method always returns a **DataFrame** containing both predictions and features (not just a series of predictions).
+* **No `random_state`**: Do not pass a `random_state` argument when instantiating BigFrames ML models, because this parameter is not supported in the BigFrames ML package.
+* **Automatic Scaling**: Do not use `OneHotEncoder` or `StandardScaler` unless explicitly requested (handled automatically).
+* **Hyperparameter Tuning**: You must write custom loops (BigFrames lacks `GridSearchCV` or `RandomizedSearchCV`).
+* **ARIMA Plus** (Forecasting):
+    * Import from `bigframes.ml.forecasting`.
+    * Sort data chronologically and split around a timepoint before training.
+    * Prediction horizon must be less than or equal to training horizon.
+* **PCA**: BigFrames' PCA class lacks simple `transform()` method. Use `predict()` instead.
+* **Model Persistence**: To persist a model, use `model.to_gbq()`. To load a persisted model, use `bpd.read_gbq_model()`.